janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
23.82k stars 1.39k forks source link

bug: Model that works via `ollama` does not work via jan.ai #3971

Open savchenko opened 3 weeks ago

savchenko commented 3 weeks ago

Jan version

0.5.7

Describe the Bug

codestral:22b-v0.1-q3_K_M works perfectly fine via ollama / Hollama, but returns "Failed to start" via Jan.ai

Steps to Reproduce

  1. Download latest version of Jan.ai
  2. Download 22B Codestral
  3. Try to launch

Screenshots / Logs

image

ERROR Error loading the model - llama_engine.cc:423

[CORTEX]:: Load model success with response {}
[CORTEX]:: Validating model codestral-22b
[CORTEX]:: Validate model state with response 409
[CORTEX]:: Validate model state failed with response {"message":"Model has not been loaded, please load model into cortex.llamacpp"} and status is "Conflict"
[CORTEX]::Error: Validate model status failed

What is your OS?

louis-jan commented 2 weeks ago

Hi @savchenko. We really want to reproduce the issue. Could you tell us the context length and NGL you have set in the model settings? (Top-right corner of the screen, Model tab). It would be great if you could upload the log file here.

So far as I know, Ollama is limited to a 2048 context length, requiring less RAM/VRAM to run. You can configure the same parameters on Jan to fit your device's capability, I guess it's currently 4096 by default.