bug: can enable GPU acceleration with cuda not installed - model fails to start

johnhaire89 commented 2 months ago

[X] I have searched the existing issues

Current behavior

I was playing with Jan for the first time and realised that GPU acceleration wasn't enabled. I toggled the "GPU Acceleration" switch to enable it for my NVIDIA RTA A2000 with no error.

When I next typed into the chat window, Jan wasn't able to start the model.

Problem was that I didn't have CUDA toolkit installed. Per SO answer at https://stackoverflow.com/a/55717476, nvidia-smi shows the supported CUDA version, but nvcc --version should be used to check the installed version. I installed CUDA Toolkit and it's back to working like magic.

This is probably more a feature request then a bug, but that toggle should probably show an error if I try to enable GPU acceleration for a nvidia card when CUDA toolkit isn't installed.

Minimum reproduction step

Start with a Windows PC with a NVidia gpu and CUDA Toolkit not installed (per nvcc --version)

Under Settings > Advanced Settings, select the GPU and toggle the switch - toast says "Successfully turned on GPU acceleration"
Try to start Mistral Instruct 7B Q4 - model fails to start

Expected behavior

When I try to enable GPU Acceleration for a Nvidia GPU in an environment where CUDA Toolkit isn't installed, I should get a helpful error. Maybe a warning can be displayed next to GPU in the dropdown?

Screenshots / Logs

2024-08-13T02:27:29.268Z [CORTEX]::Debug: Spawn cortex at path: C:\Users\username\jan\extensions\@janhq\inference-cortex-extension\dist\bin\win-cuda-12-0\cortex-cpp.exe, and args: 1,127.0.0.1,3928
2024-08-13T02:27:29.268Z [CORTEX]::Debug: Spawning cortex subprocess...
2024-08-13T02:27:29.268Z [APP]::C:\Users\username\jan\extensions\@janhq\inference-cortex-extension\dist\bin\win-cuda-12-0
2024-08-13T02:27:29.380Z [CORTEX]::Debug: cortex is ready
2024-08-13T02:27:29.380Z [CORTEX]::Debug: Loading model with params {"cpu_threads":15,"ctx_len":2048,"prompt_template":"{system_message} [INST] {prompt} [/INST]","llama_model_path":"C:\\Users\\username\\jan\\models\\mistral-ins-7b-q4\\Mistral-7B-Instruct-v0.3-Q4_K_M.gguf","ngl":33,"system_prompt":"","user_prompt":" [INST] ","ai_prompt":" [/INST]","model":"mistral-ins-7b-q4"}
2024-08-13T02:27:29.391Z [CORTEX]::Debug: 20240813 02:27:29.291000 UTC 34396 INFO  cortex-cpp version: default_version - main.cc:73
20240813 02:27:29.292000 UTC 34396 INFO  cortex.llamacpp version: 0.1.20-30.06.24 - main.cc:78
20240813 02:27:29.292000 UTC 34396 INFO  Server started, listening at: 127.0.0.1:3928 - main.cc:81
20240813 02:27:29.292000 UTC 34396 INFO  Please load your model - main.cc:82
20240813 02:27:29.292000 UTC 34396 INFO  Number of thread is:20 - main.cc:89
20240813 02:27:29.383000 UTC 25336 INFO  CPU instruction set: fpu = 1| mmx = 1| sse = 1| sse2 = 1| sse3 = 1| ssse3 = 1| sse4_1 = 1| sse4_2 = 1| pclmulqdq = 1| avx = 1| avx2 = 1| avx512_f = 0| avx512_dq = 0| avx512_ifma = 0| avx512_pf = 0| avx512_er = 0| avx512_cd = 0| avx512_bw = 0| has_avx512_vl = 0| has_avx512_vbmi = 0| has_avx512_vbmi2 = 0| avx512_vnni = 0| avx512_bitalg = 0| avx512_vpopcntdq = 0| avx512_4vnniw = 0| avx512_4fmaps = 0| avx512_vp2intersect = 0| aes = 1| f16c = 1| - server.cc:277
20240813 02:27:29.392000 UTC 25336 ERROR Could not load engine: Could not load library "C:\Users\username\jan\extensions\@janhq\inference-cortex-extension\dist\bin\win-cuda-12-0/engines/cortex.llamacpp/engine.dll"
The specified module could not be found.

 - server.cc:290

2024-08-13T02:27:29.392Z [CORTEX]::Debug: Load model success with response {}
2024-08-13T02:27:29.398Z [CORTEX]::Debug: Validate model state failed with response "Conflict"
2024-08-13T02:27:29.398Z [CORTEX]::Error: Validate model status failed
2024-08-13T02:27:29.397Z [CORTEX]::Debug: Validate model state with response 409
2024-08-13T02:28:29.958Z [CORTEX]::Debug: Request to kill cortex
2024-08-13T02:28:29.958Z [CORTEX]::Debug: Killing PID 21376

Jan version

0.5.2

In which operating systems have you tested?

[ ] macOS
[X] Windows
[ ] Linux

Environment details

Windows 11 NVIDIA RTX A2000 8GB Laptop GPU8192MB VRAM CUDA toolkit not installed

louis-jan commented 1 month ago

@Van-QA @imtuyethan I think that is something we implemented regarding error handling in the past? Which leads the user to the CUDA additional installation page.

dan-homebrew commented 1 month ago

@johnhaire89 FYI, Jan is in the process of overhauling how we deal with llama.cpp binaries and GPU dependencies.

llama.cpp now bundles its CUDA dependencies
Jan will likely shift towards bundling CUDA dependencies together with the llama.cpp engine

@Van-QA I will keep this bug open. Once we clean up PM systems, let's link the 2 epics that would solve this bug. My style is to only close bugs once the corresponding feature is shipped.

Jan should embed llama.cpp through Cortex + cortex.llamacpp
cortex engines llama.cpp install should also pull CUDA dependences, cc @namchuai (FYI)

dan-homebrew commented 1 month ago

Handling this bug as part of https://github.com/janhq/cortex.cpp/issues/1165

louis-jan commented 1 month ago

Hi @dan-homebrew @imtuyethan. This is a known issue, there is a fix in 0.5.4: https://github.com/janhq/jan/issues/3552.

Show a corresponding error message.
Allow users to install dependencies.

We have this step to let user install additional dependencies right in the app (without redirecting users out of the app).

In the next update of integrating cortex-cpp engine pull, there should be no extra request to install these dependencies, BUT there this error message would really help in case there is a Driver/Cuda update that does not work with the pulled engine & it's dependencies.

imtuyethan commented 1 week ago

The fix is included in Jan's path to cortex.cpp: https://github.com/janhq/jan/issues/3690

janhq / jan