Closed karpathy closed 3 months ago
Just adding to what @karpathy said. It seems an oversight to give instructions on how to download the model locally, only to then instruct us to use the hugging face downloads. This is confusing for someone unfamiliar with llama, and appears to be something missing. We want a sample python script that leads us to run the model locally that we just downloaded.
When can we expect an update on this? On this prompt format page they say " Note that although prompts designed for Llama 3 should work unchanged in Llama 3.1"
Would that mean those example scripts would work the same? or not?
The "example_chat_completion.py" calls "chat_completion" from "generation.py" which in turn calls "encode_dialog_prompt". And Dialog is list of Json/dictionary with role and content.
No such example inference code has been provided with 3.1 models
I just got off work I'm going to clean up and I will go thru your format
On Tue, Aug 13, 2024, 4:13 PM seg-aiaec @.***> wrote:
When can we expect an update on this? On this prompt format page https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/ they say " Note that although prompts designed for Llama 3 should work unchanged in Llama 3.1"
Would that mean those example scripts would work the same? or not?
The "example_chat_completion.py" calls "chat_completion" from "generation.py" which in turn calls "encode_dialog_prompt". And Dialog is list of Json/dictionary with role and content.
— Reply to this email directly, view it on GitHub https://github.com/meta-llama/llama-models/issues/82#issuecomment-2287148010, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ6UO367QCGS7ELYMQ2SHSTZRJZIPAVCNFSM6AAAAABLZQANFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBXGE2DQMBRGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
thanks @karpathy for opening this issue, we are targeting llama-stack as our preferred path for inference, please take a look at that, we would also appreciate your feedback on the RFC as well.
To provide alternatives, we are also updating the instructions to run our previous style of example as well soon. This PR should solve the issue.
Hi, I believe there are docs missing on how to actually run the model once you download it? E.g. I followed the instructions and downloaded the 3.1 8B (base) model into the
models/llama3_1/Meta-Llama-3.1-8B/
directory, but it's not clear what to do next. I'm guessing you'd want to load theparams.json
, init theModelArgs
with it, init theTransformer
, load the params fromconsolidated.00.pth
andtorchrun
that?I'm guessing it would be along the lines of what exists in the llama3 repo (e.g. example_text_completion.py), which I am a bit hesitatant to build on given the notice about it being deprecated.