Question about models - Githubissues

Ezyweb-uk commented 11 months ago

I found this interesting project via the 'AI Anywhere' channel on YouTube. I've installed Modular and Mojo, and successfully run your test on an under powered mini computer with only a 1.5GHz 4 core Intel Celeron cpu, running Ubuntu 20.04.6, and this achieved 32.5 tok/s.

I'm an LLM newbie so my questions may appear stupid!! Can this project be run with other models?

I tried the following: mojo llama2.mojo /home/ezyweb/Public/chatpdf1/models/llama-2-7b-chat.Q4_K_M.gguf -s 100 -n 256 -t 0.5 -i "What is Llama 2"

And got the result: num hardware threads: 4 SIMD vector width: 8 checkpoint size: 4081004224 [ 3891 MB ] Killed

Is that likely an under resourced hardware issue or is the project not compatible with .gguf models?

Ezyweb-uk commented 11 months ago

From your answer here I think the answer is that it doesn't work with gguf models.

tairov commented 11 months ago

Hi @Ezyweb-uk , thanks for you question. You're correct, at the moment llama2.mojo supports tinylama models based on GQA. In this issue #27 we're discussing some discrepancies & changes in tokenizer so that it can run tinyllama-1.1B

I think GGUF models must be somehow converted to llama2.c format. Didn't have time to discover this topic. I saw on llama.cpp there are converter that transform llama2.c model into GGUF, maybe it can be used for reverse conversion..

tairov / llama2.mojo

Question about models #28