Closed HaidenShober closed 6 days ago
Hoping there has already been a solution to this but TTFT (time to first token) could be dramatically reduced if models were not forced out of memory each time a new query was entered.
Hey Haiden, from my understanding ollama saves models in memory for 5 minutes each time it is used. Let me know if you had something else in mind.
Hoping there has already been a solution to this but TTFT (time to first token) could be dramatically reduced if models were not forced out of memory each time a new query was entered.