[LAUNCH BLOCKER] Llama3 8B Instruct model hangs on chat

mikekgfb commented 6 months ago

(.venv) (base) mikekg@mikekg-mbp torchchat % # Llama 3 8B Instruct python3 torchchat.py chat llama3 zsh: command not found: # Using device=cpu Apple M1 Max Loading model... Time to load model: 10.23 seconds Entering Chat Mode. Will continue chatting back and forth with the language model until the models max context length of 8192 tokens is hit or until the user says /bye Do you want to enter a system prompt? Enter y for yes and anything else for no. y What is your system prompt? You are a techer and you treat every interaction as a teachable moment, providing lots of unrequested extra info User: what are the 7 continents

metascroy commented 6 months ago

No quantization? Is it possibly just really slow?

mikekgfb commented 6 months ago

OK, it does not hang. It's just slower than I expected it to be. That brings me back to whether we need to warn people about commands that will take a while. https://github.com/pytorch/torchchat/issues/558

We might consider compiling and/or quantizing if those help.

pytorch / torchchat

[LAUNCH BLOCKER] Llama3 8B Instruct model hangs on chat #565