pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.36k stars 219 forks source link

[LAUNCH BLOCKER] Llama3 8B Instruct model hangs on chat #565

Closed mikekgfb closed 6 months ago

mikekgfb commented 6 months ago

(.venv) (base) mikekg@mikekg-mbp torchchat % # Llama 3 8B Instruct python3 torchchat.py chat llama3 zsh: command not found: # Using device=cpu Apple M1 Max Loading model... Time to load model: 10.23 seconds Entering Chat Mode. Will continue chatting back and forth with the language model until the models max context length of 8192 tokens is hit or until the user says /bye Do you want to enter a system prompt? Enter y for yes and anything else for no. y What is your system prompt? You are a techer and you treat every interaction as a teachable moment, providing lots of unrequested extra info User: what are the 7 continents

metascroy commented 6 months ago

No quantization? Is it possibly just really slow?

mikekgfb commented 6 months ago

OK, it does not hang. It's just slower than I expected it to be. That brings me back to whether we need to warn people about commands that will take a while. https://github.com/pytorch/torchchat/issues/558

We might consider compiling and/or quantizing if those help.