janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
21.68k stars 1.24k forks source link

failed to start model with nvidia #2586

Closed kushagra-xo closed 2 weeks ago

kushagra-xo commented 4 months ago

Steps to reproduce Steps to reproduce the behavior: Start any model

Expected behavior Model works

Environment details

Logs Global shortcut registered successfully APPIMAGE env is not defined, current application is not an AppImage 2024-04-02T17:15:59.241Z [SPECS]::Version: 0.4.10 2024-04-02T17:15:59.249Z [SPECS]::CPUs: [{"model":"Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz","speed":2700,"times":{"user":545620,"nice":7740,"sys":289730,"idle":2220380,"irq":35820}},{"model":"Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz","speed":2700,"times":{"user":685690,"nice":4940,"sys":220770,"idle":2139120,"irq":83310}},{"model":"Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz","speed":2699,"times":{"user":724770,"nice":6030,"sys":271540,"idle":2136020,"irq":18750}},{"model":"Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz","speed":2700,"times":{"user":800140,"nice":7510,"sys":246230,"idle":2080390,"irq":17070}}] 2024-04-02T17:15:59.250Z [SPECS]::Machine: x86_64 2024-04-02T17:15:59.254Z [SPECS]::Endianness: LE 2024-04-02T17:15:59.255Z [SPECS]::Parallelism: 4 2024-04-02T17:15:59.255Z [SPECS]::Free Mem: 3873214464 2024-04-02T17:15:59.259Z [SPECS]::Total Mem: 8047603712 2024-04-02T17:15:59.268Z [SPECS]::OS Version: #1 SMP PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35 +0000 2024-04-02T17:15:59.277Z [SPECS]::OS Platform: linux 2024-04-02T17:15:59.278Z [SPECS]::OS Release: 6.8.2-arch2-1 2024-04-02T17:16:00.839Z [SPECS]::0, 2048, NVIDIA GeForce 930M

false [ '0' ] Error occurred in handler for 'writeFileSync': TypeError [ERR_INVALID_ARG_TYPE]: The "data" argument must be of type string or an instance of Buffer, TypedArray, or DataView. Received undefined at Object.writeFileSync (node:fs:2314:5) at /usr/lib/jan/app.asar/node_modules/@janhq/core/dist/node/index.cjs.js:2148:35 { code: 'ERR_INVALID_ARG_TYPE' } 2024-04-02T17:16:07.378Z [NITRO]::CPU informations - 2 2024-04-02T17:16:07.381Z [NITRO]::Debug: Request to kill Nitro 2024-04-02T17:16:07.440Z [NITRO]::Debug: Nitro process is terminated 2024-04-02T17:16:07.441Z [NITRO]::Debug: Spawning Nitro subprocess... 2024-04-02T17:16:07.442Z [NITRO]::Debug: Spawn nitro at path: /home/kj/jan/extensions/@janhq/inference-nitro-extension/dist/bin/linux-cuda-12-0/nitro, and args: 1,127.0.0.1,3928 2024-04-02T17:16:07.507Z [NITRO]::Debug: _
/_/
_ / /\ / /\
\ \:\ / /\ / /\ / /::\ / /::\
\ \:\ / /:/ / /:/ / /:/\:\ / /:/\:\
___
_\:\ //::\ / /:/ / /:/ \:\ / /:/ \:\ //::::::::\ _\/\:_ / /::\ //:/ /:/ //:/ \\:\ \ \:\\\/ \ \:\/\ //:/\:\ \ \:\/:::::/ \ \:\ / /:/ \ \:\ ~~~ \\::/ _\/ \:\ \ \::/~~~~ \ \:\ /:/ \ \:\ //:/ \ \:\ \ \:\ \ \:\/:/
\ \:\ \
\/ _
\/ \ \:\ \ \::/
__\/
17:16:07.757Z [NITRO]::Debug: Nitro is ready 2024-04-02T17:16:07.759Z [NITRO]::Debug: Loading model with params {"ctx_len":2048,"prompt_template":"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant","llama_model_path":"/home/kj/jan/models/llamacorn-1.1b/llamacorn-1.1b-chat.Q8_0.gguf","system_prompt":"<|im_start|>system\n","user_prompt":"<|im_end|>\n<|im_start|>user\n","ai_prompt":"<|im_end|>\n<|im_start|>assistant","cpu_threads":2,"ngl":100} 2024-04-02T17:16:07.804Z [NITRO]::Error: ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices:

2024-04-02T17:16:07.805Z [NITRO]::Error: Device 0: NVIDIA GeForce 930M, compute capability 5.0, VMM: yes

2024-04-02T17:16:07.945Z [NITRO]::Debug: 94m _\/ _\/
20240402 17:16:07.506798 UTC 22013 INFO Nitro version: - main.cc:50 20240402 17:16:07.506836 UTC 22013 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:54 20240402 17:16:07.506837 UTC 22013 INFO Please load your model - main.cc:55 20240402 17:16:07.506844 UTC 22013 INFO Number of thread is:4 - main.cc:62 20240402 17:16:07.784280 UTC 22015 INFO Setting up GGML CUBLAS PARAMS - llamaCPP.cc:626 {"timestamp":1712078167,"level":"INFO","function":"loadModelImpl","line":637,"message":"system info","n_threads":2,"total_threads":4,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | "}

2024-04-02T17:16:07.967Z [NITRO]::Error: llama_model_loader: loaded meta data with 25 key-value pairs and 201 tensors from /home/kj/jan/models/llamacorn-1.1b/llamacorn-1.1b-chat.Q8_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = . llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 7 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama

2024-04-02T17:16:07.977Z [NITRO]::Error: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...

2024-04-02T17:16:08.003Z [NITRO]::Error: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...

2024-04-02T17:16:08.005Z [NITRO]::Error: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...

2024-04-02T17:16:08.035Z [NITRO]::Error: llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 23: tokenizer.chattemplate str = {% for message in messages %}{{'<|im... llama_model_loader: - kv 24: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q8_0: 156 tensors

2024-04-02T17:16:08.086Z [NITRO]::Error: llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 1.09 GiB (8.50 BPW) llm_load_print_meta: general.name = . llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 2 '' llm_load_print_meta: LF token = 13 '<0x0A>'

2024-04-02T17:16:08.086Z [NITRO]::Error: llm_load_tensors: ggml ctx size = 0.15 MiB

2024-04-02T17:16:09.585Z [NITRO]::Error: llm_load_tensors: offloading 22 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 23/23 layers to GPU llm_load_tensors: CPU buffer size = 66.41 MiB llm_load_tensors: CUDA0 buffer size = 1048.51 MiB . 2024-04-02T17:16:09.631Z [NITRO]::Error: . 2024-04-02T17:16:09.634Z [NITRO]::Error: . 2024-04-02T17:16:09.647Z [NITRO]::Error: . 2024-04-02T17:16:09.656Z [NITRO]::Error: . 2024-04-02T17:16:09.666Z [NITRO]::Error: . 2024-04-02T17:16:09.669Z [NITRO]::Error: . 2024-04-02T17:16:09.681Z [NITRO]::Error: . 2024-04-02T17:16:09.690Z [NITRO]::Error: . 2024-04-02T17:16:09.699Z [NITRO]::Error: . 2024-04-02T17:16:09.702Z [NITRO]::Error: . 2024-04-02T17:16:09.715Z [NITRO]::Error: . 2024-04-02T17:16:09.724Z [NITRO]::Error: . 2024-04-02T17:16:09.733Z [NITRO]::Error: . 2024-04-02T17:16:09.736Z [NITRO]::Error: . 2024-04-02T17:16:09.749Z [NITRO]::Error: . 2024-04-02T17:16:09.758Z [NITRO]::Error: . 2024-04-02T17:16:09.766Z [NITRO]::Error: . 2024-04-02T17:16:09.769Z [NITRO]::Error: . 2024-04-02T17:16:09.782Z [NITRO]::Error: . 2024-04-02T17:16:09.791Z [NITRO]::Error: . 2024-04-02T17:16:09.799Z [NITRO]::Error: . 2024-04-02T17:16:09.803Z [NITRO]::Error: . 2024-04-02T17:16:09.815Z [NITRO]::Error: . 2024-04-02T17:16:09.824Z [NITRO]::Error: . 2024-04-02T17:16:09.832Z [NITRO]::Error: . 2024-04-02T17:16:09.836Z [NITRO]::Error: . 2024-04-02T17:16:09.848Z [NITRO]::Error: . 2024-04-02T17:16:09.857Z [NITRO]::Error: . 2024-04-02T17:16:09.866Z [NITRO]::Error: . 2024-04-02T17:16:09.869Z [NITRO]::Error: . 2024-04-02T17:16:09.882Z [NITRO]::Error: . 2024-04-02T17:16:09.892Z [NITRO]::Error: . 2024-04-02T17:16:09.901Z [NITRO]::Error: . 2024-04-02T17:16:09.904Z [NITRO]::Error: . 2024-04-02T17:16:09.920Z [NITRO]::Error: . 2024-04-02T17:16:09.927Z [NITRO]::Error: . 2024-04-02T17:16:09.936Z [NITRO]::Error: . 2024-04-02T17:16:09.940Z [NITRO]::Error: . 2024-04-02T17:16:09.952Z [NITRO]::Error: . 2024-04-02T17:16:09.965Z [NITRO]::Error: . 2024-04-02T17:16:09.970Z [NITRO]::Error: . 2024-04-02T17:16:09.974Z [NITRO]::Error: . 2024-04-02T17:16:09.992Z [NITRO]::Error: . 2024-04-02T17:16:09.997Z [NITRO]::Error: . 2024-04-02T17:16:10.006Z [NITRO]::Error: . 2024-04-02T17:16:10.009Z [NITRO]::Error: . 2024-04-02T17:16:10.023Z [NITRO]::Error: . 2024-04-02T17:16:10.032Z [NITRO]::Error: . 2024-04-02T17:16:10.041Z [NITRO]::Error: . 2024-04-02T17:16:10.044Z [NITRO]::Error: . 2024-04-02T17:16:10.060Z [NITRO]::Error: . 2024-04-02T17:16:10.067Z [NITRO]::Error: . 2024-04-02T17:16:10.076Z [NITRO]::Error: . 2024-04-02T17:16:10.081Z [NITRO]::Error: . 2024-04-02T17:16:10.093Z [NITRO]::Error: . 2024-04-02T17:16:10.102Z [NITRO]::Error: . 2024-04-02T17:16:10.112Z [NITRO]::Error: . 2024-04-02T17:16:10.115Z [NITRO]::Error: . 2024-04-02T17:16:10.128Z [NITRO]::Error: . 2024-04-02T17:16:10.137Z [NITRO]::Error: . 2024-04-02T17:16:10.146Z [NITRO]::Error: . 2024-04-02T17:16:10.149Z [NITRO]::Error: . 2024-04-02T17:16:10.163Z [NITRO]::Error: . 2024-04-02T17:16:10.172Z [NITRO]::Error: . 2024-04-02T17:16:10.182Z [NITRO]::Error: . 2024-04-02T17:16:10.185Z [NITRO]::Error: . 2024-04-02T17:16:10.198Z [NITRO]::Error: . 2024-04-02T17:16:10.207Z [NITRO]::Error: . 2024-04-02T17:16:10.217Z [NITRO]::Error: . 2024-04-02T17:16:10.220Z [NITRO]::Error: . 2024-04-02T17:16:10.233Z [NITRO]::Error: . 2024-04-02T17:16:10.241Z [NITRO]::Error: . 2024-04-02T17:16:10.250Z [NITRO]::Error: . 2024-04-02T17:16:10.253Z [NITRO]::Error: . 2024-04-02T17:16:10.266Z [NITRO]::Error: . 2024-04-02T17:16:10.274Z [NITRO]::Error: . 2024-04-02T17:16:10.283Z [NITRO]::Error: . 2024-04-02T17:16:10.286Z [NITRO]::Error: . 2024-04-02T17:16:10.299Z [NITRO]::Error: . 2024-04-02T17:16:10.307Z [NITRO]::Error: . 2024-04-02T17:16:10.316Z [NITRO]::Error: . 2024-04-02T17:16:10.319Z [NITRO]::Error: . 2024-04-02T17:16:10.332Z [NITRO]::Error: . 2024-04-02T17:16:10.341Z [NITRO]::Error: . 2024-04-02T17:16:10.349Z [NITRO]::Error: . 2024-04-02T17:16:10.352Z [NITRO]::Error: . 2024-04-02T17:16:10.365Z [NITRO]::Error: . 2024-04-02T17:16:10.374Z [NITRO]::Error: . 2024-04-02T17:16:10.425Z [NITRO]::Error: .

2024-04-02T17:16:10.428Z [NITRO]::Error: llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1

2024-04-02T17:16:10.433Z [NITRO]::Error: llama_kv_cache_init: CUDA0 KV buffer size = 44.00 MiB llama_new_context_with_model: KV self size = 44.00 MiB, K (f16): 22.00 MiB, V (f16): 22.00 MiB

2024-04-02T17:16:10.438Z [NITRO]::Error: llama_new_context_with_model: CUDA_Host input buffer size = 9.02 MiB

2024-04-02T17:16:10.445Z [NITRO]::Error: llama_new_context_with_model: CUDA0 compute buffer size = 148.01 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 4.00 MiB llama_new_context_with_model: graph splits (measure): 3

2024-04-02T17:16:10.446Z [NITRO]::Error: CUDA error: no kernel image is available for execution on the device current device: 0, in function ggml_cuda_op_flatten at /home/runner/actions-runner/_work/nitro/nitro/llama.cpp/ggml-cuda.cu:9382 cudaGetLastError() GGML_ASSERT: /home/runner/actions-runner/_work/nitro/nitro/llama.cpp/ggml-cuda.cu:242: !"CUDA error"

2024-04-02T17:16:10.611Z [NITRO]::Error: ptrace: Operation not permitted. No stack. The program is not being run.

2024-04-02T17:16:13.345Z [NITRO]::Debug: Nitro exited with code: null 2024-04-02T17:16:14.855Z [NITRO]::Error: Load model failed with error TypeError: fetch failed 2024-04-02T17:16:14.856Z [NITRO]::Error: TypeError: fetch failed

tikikun commented 4 months ago

@CameronNguyen130820 we postponed solving this issue since it might related to outdated GPU doesn't support some operation

Van-QA commented 2 weeks ago

hi there, if possible, please try our nightly build https://github.com/janhq/jan?tab=readme-ov-file#download, and feel free to get back to us if the issue persists, thank you