janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
23.82k stars 1.39k forks source link

bug: Model loading request timeout when uploading documents #4056

Closed imtuyethan closed 1 week ago

imtuyethan commented 1 week ago

Jan version

0.5.8-731

Describe the Bug

I have encountered model loading failures 2 times with timeout errors when attempting to load models. The request to POST http://127.0.0.1:39291/v1/models/start times out consistently. Not with just Llama 8B, but also 3B.

wdw

Model loading requests are timing out during initialization with the error:

Request timed out: POST http://127.0.0.1:39291/v1/models/start

Server logs:

20241120 12:01:07.411Z [CORTEX]:: Spawning cortex subprocess...
20241120 12:01:07.411Z [CORTEX]:: Spawn cortex at path: /Users/han/Library/Application Support/Jan-nightly/data/extensions/@janhq/inference-cortex-extension/dist/bin/cortex-server
20241120 12:01:07.412Z [CORTEX]: Engine variant: mac-arm64

Steps to Reproduce

  1. Sending a document, attempt to load Llama 3.1 8B Instruct Q4 model to answer
  2. Request sent to /v1/models/start endpoint
  3. Request times out after extended period
  4. Error displayed about model loading failure

Screenshots / Logs

cortex.log app.log

OS: macOS (Darwin Kernel Version 23.2.0) Hardware: Apple M2 Jan Version: v0.5.8-731 Memory: 16GB Total Model: Llama 3.1 8B Instruct Q4 Cortex Version: v1.0.3-rc5

What is your OS?

louis-jan commented 1 week ago

I’ve investigated and found a client configuration issue. Will sneak in 0.5.9 a fix.