janhq / cortex.cpp

Run and customize Local LLMs.
https://cortex.so
Apache License 2.0
1.97k stars 111 forks source link

bug: Cuda not used #1281

Closed DAVIDSystems closed 1 month ago

DAVIDSystems commented 1 month ago

Cortex version

0.0.0.1

Describe the Bug

Cortex for Windows downloaded from the link provided here: https://cortex.so/. REST API works as decribed. Chat/Completion works also, but does not use CUDA. CUDA 12.2 is installed and worked well with Nitro.ex e and also Jan.

Steps to Reproduce

start cortex.exe Init llamma.cpp engine (pull mode) llamma 3.1 Start model llamma 3.1 Chat (summarize a text with about 200 words) Works but takes 11 minutes, which looks like a CPU attempt for me. Jan makes the same in some seconds. if i start nvidia-smi is does not show GPU activity, during completion. if you repeat the chat request it takes only 2min40 sec, but this is still much too slow.

Screenshots / Logs

No response

What is your OS?

What engine are you running?

0xSage commented 1 month ago

Hey @DAVIDSystems apologies for the confusion, we (un)released Cortex, and are currently rewriting the entire thing in c++.

The new stable launch is ETA end October or Nov.

There is a nightly build right now that has breaking changes quite often. If you wanted to help us test it. Can I ask you to try the latest build from https://discord.gg/nGp6PMrUqS instead?