Closed DAVIDSystems closed 1 month ago
Hey @DAVIDSystems apologies for the confusion, we (un)released Cortex, and are currently rewriting the entire thing in c++.
The new stable launch is ETA end October or Nov.
There is a nightly build right now that has breaking changes quite often. If you wanted to help us test it. Can I ask you to try the latest build from https://discord.gg/nGp6PMrUqS instead?
Cortex version
0.0.0.1
Describe the Bug
Cortex for Windows downloaded from the link provided here: https://cortex.so/. REST API works as decribed. Chat/Completion works also, but does not use CUDA. CUDA 12.2 is installed and worked well with Nitro.ex e and also Jan.
Steps to Reproduce
start cortex.exe Init llamma.cpp engine (pull mode) llamma 3.1 Start model llamma 3.1 Chat (summarize a text with about 200 words) Works but takes 11 minutes, which looks like a CPU attempt for me. Jan makes the same in some seconds. if i start nvidia-smi is does not show GPU activity, during completion. if you repeat the chat request it takes only 2min40 sec, but this is still much too slow.
Screenshots / Logs
No response
What is your OS?
What engine are you running?