Open thonore75 opened 1 month ago
114 (windows-dev-tensorRT-LLM) OS: Windows 11 Pro (Version 23H2, build 22631.4037) CPU: AMD Ryzen Threadripper PRO 5955WX (16 cores) RAM: 32 GB GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA GeForce RTX 3090 Storage: 599 GB local disk (C:)
Mistral 8x7B Instruct Q4
(~24GB) with 2 GPUs turned on. https://github.com/user-attachments/assets/a9741c50-5073-4e51-9344-e136f63f5d0f
Aya 23 35B Q4
(~20GB) when using 2 GPUs as well:https://github.com/user-attachments/assets/37772b49-d524-47f2-ac39-58a49389d670
Deepseek Coder 33B Instruct
(~18GB), I cannot run the model, even when turned on GPU acceleration or not, so it could be a separate issue, reported here: https://github.com/janhq/jan/issues/3703https://github.com/user-attachments/assets/f8808b78-f416-4314-837c-f7e76f4d8eaa
Here's my app logs:
Quick check @thonore75 what are the models that cannot be run on your end?
Here are the models I can launch with 1 GPU but not with 2 :
After my tests, I tried to play a video you posted here (On Google Chrome), no way, no possible to play. Jan was launched but with no model loaded, my last test was a model failing to load. I stopped Jan and I was able to play your videos
Jan Compatibility.xlsx app - 1_GPU_1.log app - 2_GPUs.log app - CPU.log app - 1_GPU_0.log
I did some extra tests! For each tested configuration, the log was cleaned to have separate logs. 4 tested configurations:
For some models, it's was failing sometime after loading issue with previous tested model, but after loading a correct model, the failing model is loading.
Possibly related to https://github.com/janhq/jan/issues/3558
Regarding the failed case of CodeLlama-70b-Instruct-hf.i1-IQ4_XS: It says there was a VRAM-related OOM issue (it's a big one, that makes sense)
2024-09-19T10:26:30.142Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17827.31 MiB on device 0: cudaMalloc failed: out of memory 2024-09-19T10:26:30.272Z [CORTEX]::Error: llama_model_load: error loading model: unable to allocate backend buffer llama_load_model_from_file: failed to load model
The same for Llama-3.1-8B-Instruct, which is weird
.
2024-09-19T10:13:40.646Z [CORTEX]::Error: ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5056.03 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 5301633024 llama_new_context_with_model: failed to allocate compute buffers 2024-09-19T10:13:40.729Z [CORTEX]::Error: llama_init_from_gpt_params: error: failed to create context with model '*****Meta-Llama-3.1-8B-Instruct.Q4_0.gguf'
If you have sometime, please investigate @vansangpfiev.
If need, I can perform some extra tests with more and new models
related #1165
Jan version
0.5.3
Describe the Bug
I imported many models and for some of them, they are failing to load if I selected my both graphic cards (RTX 3060 12Go). If I unselect one of them, the model is loaded.
It will be great if the models list could indicate if the models are supporting multi-GPU
Steps to Reproduce
Screenshots / Logs
No response
What is your OS?