rashadphz / farfalle

🔍 AI search engine - self-host with local or cloud LLMs
https://www.farfalle.dev/
Apache License 2.0
2.21k stars 166 forks source link

Local Models Unloading by Default #47

Closed HaidenShober closed 6 days ago

HaidenShober commented 4 weeks ago

Hoping there has already been a solution to this but TTFT (time to first token) could be dramatically reduced if models were not forced out of memory each time a new query was entered.

rashadphz commented 6 days ago

Hey Haiden, from my understanding ollama saves models in memory for 5 minutes each time it is used. Let me know if you had something else in mind.