janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
20.81k stars 1.19k forks source link

bug: jan fails to load local models due to mis-detects Nvidia GPU and/or fails to run inference with Vulcan #2888

Open Andydna2 opened 1 month ago

Andydna2 commented 1 month ago

Describe the bug I have tried stable and nighly version - with clean install and JAN fails to load models (standard settings) or fails to run inference on Vulcan (you cn observe model loading becuase RAM usage is high but no joy)

Steps to reproduce Steps to reproduce the behavior:

  1. install Jan
  2. Use Lenovo laptop with AMD Ryzen 7735HS CPU

Expected behavior it should at least report clear error messages and maybe suggest changing Settings ?

Environment details

EXAMPLE 1: misdetects nvidia Logs

2024-05-11T08:17:17.733Z [SPECS]::OS Version: Windows 10 Home
2024-05-11T08:17:17.734Z [SPECS]::OS Platform: win32
2024-05-11T08:17:17.734Z [SPECS]::OS Release: 10.0.22631
2024-05-11T08:17:17.734Z [APP]::{"notify":true,"run_mode":"gpu","nvidia_driver":{"exist":false,"version":""},"cuda":{"exist":true,"version":"11"},"gpus":[],"gpu_highest_vram":"","gpus_in_use":[""],"is_initial":false,"vulkan":false}
2024-05-11T08:29:38.408Z [NITRO]::Debug: Request to kill Nitro
2024-05-11T08:29:38.408Z [NITRO]::CPU information - 9
2024-05-11T08:29:38.459Z [NITRO]::Debug: Nitro process is terminated
2024-05-11T08:29:38.460Z [NITRO]::Debug: Spawning Nitro subprocess...
2024-05-11T08:29:38.461Z [NITRO]::Debug: Spawn nitro at path: C:\Users\vojevoda\jan\extensions\@janhq\inference-nitro-extension\dist\bin\win-cuda-11-7\nitro.exe, and args: 1,127.0.0.1,3928
2024-05-11T08:29:45.900Z [NITRO]::Debug: Nitro exited with code: 3221225781
2024-05-11T08:29:45.900Z [NITRO]::Error: child process exited with code 3221225781
2024-05-11T08:33:57.847Z [SPECS]::Version: 0.4.12-413
2024-05-11T08:33:57.848Z [SPECS]::CPUs: [{"model":"AMD Ryzen 7 7735HS with Radeon 

EXAMPLE 2: fails to run, after loading when using VULCAN

{"timestamp":1715416730,"level":"INFO","function":"LoadModelImpl","line":708,"message":"system info","n_threads":9,"total_threads":16,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | "}

2024-05-11T08:38:50.484Z [NITRO]::Error: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\vojevoda\jan\models\llama3-8b-instruct\Meta-Llama-3-8B-Instruct-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct-imatrix
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2

2024-05-11T08:38:50.518Z [NITRO]::Error: llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...

2024-05-11T08:38:50.533Z [NITRO]::Error: llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

2024-05-11T08:38:50.606Z [NITRO]::Error: llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  19:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors

2024-05-11T08:38:51.577Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.58 GiB (4.89 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct-imatrix
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'

2024-05-11T08:38:51.645Z [NITRO]::Error: ggml_vulkan: Found 1 Vulkan devices:

2024-05-11T08:38:51.647Z [NITRO]::Error: Vulkan0: AMD Radeon(TM) 680M | uma: 1 | fp16: 1 | warp size: 64

2024-05-11T08:38:51.682Z [NITRO]::Error: llm_load_tensors: ggml ctx size =    0.22 MiB

2024-05-11T08:38:59.410Z [NITRO]::Error: llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =   281.81 MiB
llm_load_tensors:    Vulkan0 buffer size =  4403.49 MiB

app.log

Van-QA commented 1 month ago

hi @Andydna2, From your log, the Nvidia driver is false, can you try this: https://jan.ai/docs/troubleshooting#troubleshooting-nvidia-gpu to install the Nvidia driver and see if it helps?

Andydna2 commented 1 month ago

hi @Andydna2, From your log, the Nvidia driver is false, can you try this: https://jan.ai/docs/troubleshooting#troubleshooting-nvidia-gpu to install the Nvidia driver and see if it helps?

I am not sure what you mean by "Nvidia driver is false" my laptop does NOT have dedicated graphics card, only AMD iGPU - so I cant really install Nvidia drivers. I seems to me that JAN/nitro hardware detection is not working properly, in my case. Perhaps there Is there some override switch in the JASON settings ?

Van-QA commented 1 month ago

hi @Andydna2, when you try to turn on the Vulkan, does it list your AMD in the dropdown? image

On the other hand, Jan can run with CPU and it should be working just fine, would you mind trying it as well? 🙏