janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
21.16k stars 1.22k forks source link

bug: Cortex exited with code null immediately after loading a model #3089

Open mshpp opened 1 month ago

mshpp commented 1 month ago

Current behavior

Trying to load even the TinyLLaMa Chat 1.1B model doesn't work, Cortex seems to crash immediately after loading the model. This occurs on a fresh AppImage under Fedora 40.

Minimum reproduction step

  1. Open Jan
  2. Download TinyLLaMa when the app prompts you to do so
  3. Write something as the input
  4. Press enter

Expected behavior

The model should load and run without a problem.

Screenshots / Logs

2024-06-22T20:03:36.375Z [CORTEX]::CPU information - 2 2024-06-22T20:03:36.377Z [CORTEX]::Debug: Request to kill cortex 2024-06-22T20:03:36.429Z [CORTEX]::Debug: cortex process is terminated 2024-06-22T20:03:36.430Z [CORTEX]::Debug: Spawning cortex subprocess... 2024-06-22T20:03:36.431Z [CORTEX]::Debug: Spawn cortex at path: /home/user/jan/extensions/@janhq/inference-cortex-extension/dist/bin/linux-cpu/cortex-cpp, and args: 1,127.0.0.1,3928 2024-06-22T20:03:36.432Z [APP]::/home/user/jan/extensions/@janhq/inference-cortex-extension/dist/bin/linux-cpu 2024-06-22T20:03:36.550Z [CORTEX]::Debug: cortex is ready 2024-06-22T20:03:36.551Z [CORTEX]::Debug: Loading model with params {"cpu_threads":2,"ctx_len":2048,"prompt_template":"<|system|>\n{system_message}<|user|>\n{prompt}<|assistant|>","llama_model_path":"/home/user/jan/models/tinyllama-1.1b/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf","ngl":23,"system_prompt":"<|system|>\n","user_prompt":"<|user|>\n","ai_prompt":"<|assistant|>","model":"tinyllama-1.1b"} 2024-06-22T20:03:36.746Z [CORTEX]::Debug: cortex exited with code: null 2024-06-22T20:03:37.661Z [CORTEX]::Error: Load model failed with error TypeError: fetch failed 2024-06-22T20:03:37.661Z [CORTEX]::Error: TypeError: fetch failed

Jan version

0.5.1

In which operating systems have you tested?

Environment details

Operating System: Fedora 40 Processor: Intel Core i5-3320M, 2C/4T RAM: 16Gb

vansangpfiev commented 1 month ago

Thank you for reporting the issue. Could you please provide more information, it would help us to debug:

mshpp commented 1 month ago

Sure, both files are attached cpuinfo.txt app.log

vansangpfiev commented 1 month ago

From the log and cpuinfo, I think Cortex crashed because we don't have avx build with fma flag off. Could you please download a nightly build for cortex.llamacpp and replace the .so lib in the path: /home/user/jan/extensions/@janhq/inference-cortex-extension/dist/bin/linux-cpu/cortex-cpp/engines/cortex.llamacpp/ Download link:

https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.18-25.06.24/cortex.llamacpp-0.1.18-25.06.24-linux-amd64-noavx.tar.gz
mshpp commented 3 weeks ago

This works, now I can get a response from the model. However, it seems that only the first round of inference works -- that is, I can only reliably get a single answer. The next one gets stuck on loading for a long time, and then it either completes or it doesn't and keeps loading. This is with the same TinyLLaMA model, which runs quite fast even on my hardware.

vansangpfiev commented 3 weeks ago

Thanks for trying the nightly build. Could you please share the app.log again?