randaller / llama-chat

Chat with Meta's LLaMA models at home made easy
GNU General Public License v3.0
833 stars 118 forks source link

test #14

Closed tkone2018 closed 1 year ago

tkone2018 commented 1 year ago

hello, when i ask question,need always load the model when AI gives a word, is it ok? image

randaller commented 1 year ago

Hi @tkone2018, model weights are loaded in RAM once at startup, but then for each token weights are passing from RAM to GPU one by one.