Closed tkone2018 closed 1 year ago
hello, when i ask question,need always load the model when AI gives a word, is it ok?
Hi @tkone2018, model weights are loaded in RAM once at startup, but then for each token weights are passing from RAM to GPU one by one.
hello, when i ask question,need always load the model when AI gives a word, is it ok?![image](https://user-images.githubusercontent.com/41476675/224594307-8dc54cf1-b397-4825-a744-8b6cf84fd6fc.png)