bug: GPU accelerated model fails to load without a visible error message

sgdesmet commented 3 days ago

Jan version

0.5.5

Describe the Bug

I'm running Fedora 40 on a laptop with a GTX 1050 Ti with 4Gb RAM. When enabling GPU acceleration and attempting to run models that are marked as 'Slow on your device' (such as Llama 3.2 3B Instruct Q8) , they fail to start without any visible error message. At first glance, the logs show what appears to be a memory issue:

2024-10-03T09:04:05.888Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1026.00 MiB on device 0: cudaMalloc failed: out of memory

Is it correct that my device is unable to run this particular model? If so, a 'Not enough VRAM' indicator when downloading the model and an explicit error message when starting the model would be my expectation.

Steps to Reproduce

Install additional Cortex.cpp dependencies
Enable GPU acceleration
Download model Llama 3.2 3B Instruct Q8
Start a new thread and enter some text
Observe 'Starting model' loading indicator
Nothing happens

Screenshots / Logs

app.log

What is your OS?

[ ] MacOS
[ ] Windows
[X] Linux

louis-jan commented 3 days ago

Related issue #3760

Asherathe commented 19 hours ago

Jan works well with small models, but larger models are unpredictable. With 16Gb VRAM and 32Gb RAM, sometimes a 34B model loads, sometimes it doesn't. Sometimes my whole system freezes and I have to restart, sometimes my GPU driver silently resets. Sometimes, everything works normally. Koboldcpp, Msty, LM Studio, etc. load the same model fine every time.

janhq / jan