bug: Cuda not used - Githubissues

Cortex version

0.0.0.1

Describe the Bug

Cortex for Windows downloaded from the link provided here: https://cortex.so/. REST API works as decribed. Chat/Completion works also, but does not use CUDA. CUDA 12.2 is installed and worked well with Nitro.ex e and also Jan.

Steps to Reproduce

start cortex.exe Init llamma.cpp engine (pull mode) llamma 3.1 Start model llamma 3.1 Chat (summarize a text with about 200 words) Works but takes 11 minutes, which looks like a CPU attempt for me. Jan makes the same in some seconds. if i start nvidia-smi is does not show GPU activity, during completion. if you repeat the chat request it takes only 2min40 sec, but this is still much too slow.

Screenshots / Logs

No response

What is your OS?

[ ] MacOS
[X] Windows
[ ] Linux

What engine are you running?

[X] cortex.llamacpp (default)
[ ] cortex.tensorrt-llm (Nvidia GPUs)
[ ] cortex.onnx (NPUs, DirectML)

janhq / cortex.cpp

bug: Cuda not used #1281