bug: some models failed to load if many GPU are selected

thonore75 commented 1 month ago

Jan version

0.5.3

Describe the Bug

I imported many models and for some of them, they are failing to load if I selected my both graphic cards (RTX 3060 12Go). If I unselect one of them, the model is loaded.

It will be great if the models list could indicate if the models are supporting multi-GPU

Steps to Reproduce

Go to Settings -> Advanced Settings
In Choose device(s), select 2 GPUs
Go in "My Models"
Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> NOT loaded !!!
Go in Advanced Settings
Unselect one GPU from "Choose device(s)"
Go in "My Models"
Select "Meta-Llama-3.1-8B-Instruct-128k-Q4_0" and start it -> loaded !!!

Screenshots / Logs

No response

What is your OS?

[ ] MacOS
[X] Windows
[ ] Linux

imtuyethan commented 4 weeks ago

Tested on

114 (windows-dev-tensorRT-LLM) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)

Results

I was able to run Mistral 8x7B Instruct Q4 (~24GB) with 2 GPUs turned on.

https://github.com/user-attachments/assets/a9741c50-5073-4e51-9344-e136f63f5d0f

I was able to run Aya 23 35B Q4 (~20GB) when using 2 GPUs as well:

https://github.com/user-attachments/assets/37772b49-d524-47f2-ac39-58a49389d670

However, for Deepseek Coder 33B Instruct (~18GB), I cannot run the model, even when turned on GPU acceleration or not, so it could be a separate issue, reported here: https://github.com/janhq/jan/issues/3703

https://github.com/user-attachments/assets/f8808b78-f416-4314-837c-f7e76f4d8eaa

Here's my app logs:

Screenshot 2024-09-19 at 4 29 43 PM

imtuyethan commented 4 weeks ago

Quick check @thonore75 what are the models that cannot be run on your end?

thonore75 commented 4 weeks ago

Here are the models I can launch with 1 GPU but not with 2 :

CodeLlama-13b-Instruct-hf.Q8_0
CodeLlama-70b-Instruct-hf.i1-IQ4_XS
gpt4all-13b-snoozy-q4_0
gpt4all-falcon-newbpe-q4_0
Meta-Llama-3.1-8B-Claude-F16
Meta-Llama-3.1-8B-Instruct-128k-Q4_0
Meta-Llama-3.1-8B-Instruct.Q4_0
mistral-7b-openorca2.Q4_0.gguf
Nous-Hermes-2-Mistral-7B-DPO.Q4_0
orca-2-7b.Q4_0
Phi-3-mini-4k-instruct.Q4_0

app.log

thonore75 commented 4 weeks ago

After my tests, I tried to play a video you posted here (On Google Chrome), no way, no possible to play. Jan was launched but with no model loaded, my last test was a model failing to load. I stopped Jan and I was able to play your videos

thonore75 commented 4 weeks ago

Jan Compatibility.xlsx app - 1_GPU_1.log app - 2_GPUs.log app - CPU.log app - 1_GPU_0.log

I did some extra tests! For each tested configuration, the log was cleaned to have separate logs. 4 tested configurations:

CPU
GPU 0 selected
GPU 1 selected
GPU 0 & 1 selected

For some models, it's was failing sometime after loading issue with previous tested model, but after loading a correct model, the failing model is loading.

imtuyethan commented 3 weeks ago

Possibly related to https://github.com/janhq/jan/issues/3558

louis-jan commented 3 weeks ago

Regarding the failed case of CodeLlama-70b-Instruct-hf.i1-IQ4_XS: It says there was a VRAM-related OOM issue (it's a big one, that makes sense)

2024-09-19T10:26:30.142Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17827.31 MiB on device 0: cudaMalloc failed: out of memory 2024-09-19T10:26:30.272Z [CORTEX]::Error: llama_model_load: error loading model: unable to allocate backend buffer llama_load_model_from_file: failed to load model
The same for Llama-3.1-8B-Instruct, which is weird.

2024-09-19T10:13:40.646Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5056.03 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 5301633024 llama_new_context_with_model: failed to allocate compute buffers 2024-09-19T10:13:40.729Z [CORTEX]::Error: llama_init_from_gpt_params: error: failed to create context with model '*****Meta-Llama-3.1-8B-Instruct.Q4_0.gguf'

louis-jan commented 3 weeks ago

If you have sometime, please investigate @vansangpfiev.

thonore75 commented 6 days ago

If need, I can perform some extra tests with more and new models

0xSage commented 4 days ago

related #1165

janhq / cortex.cpp