guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
https://llmfarm.site
MIT License
1.06k stars 64 forks source link

When I send out my first prompt to llm after opening the app, The app will freeze for like 5-10 seconds before llm generate the response. #15

Closed steveshaoucsb closed 8 months ago

guinmoon commented 8 months ago

The moment you send the first message, the model is loaded. This is done because usually more than 1 model cannot fit in the device memory, and in order not to lose context when switching chat, the model is loaded only when the first message is sent. I will add animation later.

steveshaoucsb commented 8 months ago

Roger that thanks for the explanation!