janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
23.46k stars 1.37k forks source link

planning: [UX enhancement] Improve model loading UX and reduce initial response delays #3860

Open imtuyethan opened 3 weeks ago

imtuyethan commented 3 weeks ago

Problem Statement

There is a noticeable delay in loading the model on Windows machines when the user sends the first message. This delay is even more pronounced on GPUs like the RTX 4070, where it can take up to 10 seconds before generating a response. In some cases, the “Generating Response” bar gets stuck at 80% for up to 10 seconds, giving the impression that the software is hanging.

Some early proposed solution:

Key Scenarios Affected (To consider before proposing a solution):

UX Considerations:

dan-homebrew commented 3 weeks ago

@imtuyethan I've re-labeled this as "Planning" and shifted it to Sprint 23, with view to implement in Sprint 24