janhq / cortex.cpp

Run and customize Local LLMs.
https://cortex.so
Apache License 2.0
1.97k stars 111 forks source link

bug: some models failed to load if many GPU are selected #1458

Open thonore75 opened 1 month ago

thonore75 commented 1 month ago

Jan version

0.5.3

Describe the Bug

I imported many models and for some of them, they are failing to load if I selected my both graphic cards (RTX 3060 12Go). If I unselect one of them, the model is loaded.

It will be great if the models list could indicate if the models are supporting multi-GPU

Steps to Reproduce

  1. Go to Settings -> Advanced Settings
  2. In Choose device(s), select 2 GPUs
  3. Go in "My Models"
  4. Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> NOT loaded !!!
  5. Go in Advanced Settings
  6. Unselect one GPU from "Choose device(s)"
  7. Go in "My Models"
  8. Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> loaded !!!

Screenshots / Logs

No response

What is your OS?

imtuyethan commented 4 weeks ago

Tested on

114 (windows-dev-tensorRT-LLM) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Results

https://github.com/user-attachments/assets/a9741c50-5073-4e51-9344-e136f63f5d0f

https://github.com/user-attachments/assets/37772b49-d524-47f2-ac39-58a49389d670

https://github.com/user-attachments/assets/f8808b78-f416-4314-837c-f7e76f4d8eaa

Here's my app logs:

Screenshot 2024-09-19 at 4 29 43 PM

imtuyethan commented 4 weeks ago

Quick check @thonore75 what are the models that cannot be run on your end?

thonore75 commented 4 weeks ago

Here are the models I can launch with 1 GPU but not with 2 :

app.log

thonore75 commented 4 weeks ago

After my tests, I tried to play a video you posted here (On Google Chrome), no way, no possible to play. Jan was launched but with no model loaded, my last test was a model failing to load. I stopped Jan and I was able to play your videos

thonore75 commented 4 weeks ago

Jan Compatibility.xlsx app - 1_GPU_1.log app - 2_GPUs.log app - CPU.log app - 1_GPU_0.log

I did some extra tests! For each tested configuration, the log was cleaned to have separate logs. 4 tested configurations:

For some models, it's was failing sometime after loading issue with previous tested model, but after loading a correct model, the failing model is loading.

imtuyethan commented 3 weeks ago

Possibly related to https://github.com/janhq/jan/issues/3558

louis-jan commented 3 weeks ago
  1. Regarding the failed case of CodeLlama-70b-Instruct-hf.i1-IQ4_XS: It says there was a VRAM-related OOM issue (it's a big one, that makes sense)

    2024-09-19T10:26:30.142Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17827.31 MiB on device 0: cudaMalloc failed: out of memory 2024-09-19T10:26:30.272Z [CORTEX]::Error: llama_model_load: error loading model: unable to allocate backend buffer llama_load_model_from_file: failed to load model

  2. The same for Llama-3.1-8B-Instruct, which is weird.

    2024-09-19T10:13:40.646Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5056.03 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 5301633024 llama_new_context_with_model: failed to allocate compute buffers 2024-09-19T10:13:40.729Z [CORTEX]::Error: llama_init_from_gpt_params: error: failed to create context with model '*****Meta-Llama-3.1-8B-Instruct.Q4_0.gguf'

louis-jan commented 3 weeks ago

If you have sometime, please investigate @vansangpfiev.

thonore75 commented 6 days ago

If need, I can perform some extra tests with more and new models

0xSage commented 4 days ago

related #1165