tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 140 forks source link

Simple Gradio UI + Dockerfile #10

Closed radames closed 1 year ago

radames commented 1 year ago

Hi here a PR adding instructions on how to run a Mojo with a Dockerfile, following their dockerfile example Also added a simple Gradio web UI to visualize the stdout You can see the live demo here https://huggingface.co/spaces/radames/Gradio-llama2.mojo ps: If you'd like I could add the HuggingFace link to your Readme?

tairov commented 1 year ago

thanks for PR, looks cool HW specs seems powerful, don't you know why on HF it shows 220 tok/s ?

num hardware threads:  16  SIMD vector width:  32
radames commented 1 year ago

not sure, should it be faster?

tairov commented 1 year ago

Probably it also depends on CPU Mhz.. On 6 core cpu with SIMD vector width = 16, it's showing 385 tok/s

tairov commented 1 year ago

thanks for docker example, some folks were asking for it already 👍