I am loading 2 7 B models. The cache problem only occurs when I use autosplit to load these models. The first model prediction works fine, but when I ask subsequent questions, it keeps returning the previous response to every subsequent request/question.
Issue:
When the model caches the response, the subsequent request doesn't clear the cache and the model returns the previous responses to all the next questions.
Question:
How do you clear the cache after every request to model?
I am loading 2 7 B models. The cache problem only occurs when I use autosplit to load these models. The first model prediction works fine, but when I ask subsequent questions, it keeps returning the previous response to every subsequent request/question.
Issue: When the model caches the response, the subsequent request doesn't clear the cache and the model returns the previous responses to all the next questions.
Question: How do you clear the cache after every request to model?
Code: