test - Githubissues

randaller / llama-chat

Chat with Meta's LLaMA models at home made easy

GNU General Public License v3.0

833 stars 118 forks source link

Closed tkone2018 closed 1 year ago

tkone2018 commented 1 year ago

hello, when i ask question,need always load the model when AI gives a word, is it ok？

randaller commented 1 year ago

Hi @tkone2018, model weights are loaded in RAM once at startup, but then for each token weights are passing from RAM to GPU one by one.