bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found"

0xSage commented 1 month ago

Describe the bug

install cortex
start server
cortex run llama3:tensorrt-llm --chat
NOTE: tensorrt-LLM branch doesn't exist in llama3 hf repo
model successfully downloads, but binary is empty,but there is a model.yaml
But when running, get issue:

(base) PS C:\Windows\System32> cortex run llama3:tensorrt-llm --chat
√ Dependencies loaded in 862ms
√ API server is online
√ Model found
Downloading engine...
 ████████████████████████████████████████ 100% | ETA: 0s | 100/100
× 500 status code (no body)
Last errors:
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
- Loading model...
20240815 15:29:47.151000 UTC 10740 INFO  CPU instruction set: fpu = 1| mmx = 1| sse = 1| sse2 = 1| sse3 = 1| ssse3 = 1| sse4_1 = 1| sse4_2 = 1| pclmulqdq = 1| avx = 1| avx2 = 1| avx512_f = 1| avx512_dq = 1| avx512_ifma = 1| avx512_pf = 0| avx512_er = 0| avx512_cd = 1| avx512_bw = 1| has_avx512_vl = 1| has_avx512_vbmi = 1| has_avx512_vbmi2 = 1| avx512_vnni = 1| avx512_bitalg = 1| avx512_vpopcntdq = 1| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 1| f16c = 1| - server.cc:288
20240815 15:29:47.151000 UTC 10740 ERROR Could not load engine: Could not load library "C:\Users\n\cortex/engines/cortex.llamacpp/engine.dll"
The specified module could not be found.

 - server.cc:299
× Model loading failed
{"method":"POST","path":"/v1/models/llama3:tensorrt-llm/start","statusCode":500,"ip":"127.0.0.1","content_length":"52","user_agent":"CortexClient/JS 0.1.7","x_correlation_id":""} HTTP
...

Turns out. It somehow downloaded an empty model instead of just failing.

ah i see the issue. tensorrt-llm is an invalid tag (so cortex.so/models is terribly wrong) and cortex run llama3:tensorrt-llm downloaded a default empty model there's no hf repo branch called tensorrt-llm.

(base) PS C:\Users\n\cortex\models> cat .\llama3-tensorrt-llm.yaml
files:
  - C:\Users\n\cortex\models\llama3-tensorrt-llm\.gitattributes
model: llama3:tensorrt-llm
name: llama3:tensorrt-llm
stop: []
stream: true
max_tokens: 4096
frequency_penalty: 0.7
presence_penalty: 0.7
temperature: 0.7
top_p: 0.7
ctx_len: 4096
ngl: 100
engine: cortex.llamacpp
id: llama3:tensorrt-llm
created: 1723735451386
object: model
owned_by: ''

Specs:

windows, RTX4070 , latest cuda/nvidia
cortex v0.5.0 - 44

louis-jan commented 1 month ago

There is an engines init issue where it looks for an incorrect binary, but the one above is not completely fixed. We need to check why it links to .gitattributes and generates an invalid YAML file.

dan-homebrew commented 2 weeks ago

@vansangpfiev I am reassigning this to the Cortex team - if this issue does not exist for the C++ implementation, you can proceed to close this ticket

janhq / cortex.cpp

bug: run llama3:tensorrt-llm leads to "cortex.llamacpp engine not found" #1020