tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 140 forks source link

Killed #36

Closed Tsunami014 closed 11 months ago

Tsunami014 commented 11 months ago

I downloaded the repo and was super happy to see the story model work! Then I looked down and saw the chat so I went and installed it via the wget that was provided in the readme But, when I try to run it, this happened:

username@username:~/mojo/llama2.mojo$ mojo llama2.mojo tl-chat.bin \
    -r falcon \
    -z tok_tl-chat.bin \
    -n 256 -t 0 -s 100 -i "<|im_start|>user\nGive me a python function to generate Fibonacci sequence<|im_end|>\n<|im_start|>assistant\n"
num hardware threads:  4
SIMD vector width:  16
Killed

(sorry, I accidentally opened the issue before I finished typing it 😢 )

Actually, this even happens if I follow all the instructions, download again and all, in a new folder

tairov commented 11 months ago

Hi @Tsunami014 , thanks for your question. I can guess you don't have enough memory on the system. Mojo requires at least 8Gb RAM Also, were you able to execute llama2 on smaller models? like stories15M.bin

Tsunami014 commented 11 months ago

I was able to run the stories15M.bin file perfectly fine. My laptop has 40GB worth of space (plenty) And I'm able to run those gpt-j files in python fine (and they're 7GB in size), so I didn't think that was any problem... Oh I see what you mean. Installed RAM 8.00 GB (7.35 GB usable) Thass a shame If only there was a way to run those files