meta-llama / llama-models

Utilities intended for use with Llama models.
Other
4.92k stars 843 forks source link

How to run the model? #82

Closed karpathy closed 3 months ago

karpathy commented 3 months ago

Hi, I believe there are docs missing on how to actually run the model once you download it? E.g. I followed the instructions and downloaded the 3.1 8B (base) model into the models/llama3_1/Meta-Llama-3.1-8B/ directory, but it's not clear what to do next. I'm guessing you'd want to load the params.json, init the ModelArgs with it, init the Transformer, load the params from consolidated.00.pth and torchrun that?

I'm guessing it would be along the lines of what exists in the llama3 repo (e.g. example_text_completion.py), which I am a bit hesitatant to build on given the notice about it being deprecated.

zewpo commented 3 months ago

Just adding to what @karpathy said. It seems an oversight to give instructions on how to download the model locally, only to then instruct us to use the hugging face downloads. This is confusing for someone unfamiliar with llama, and appears to be something missing. We want a sample python script that leads us to run the model locally that we just downloaded.

el-hash-1 commented 3 months ago

When can we expect an update on this? On this prompt format page they say " Note that although prompts designed for Llama 3 should work unchanged in Llama 3.1"

Would that mean those example scripts would work the same? or not?

The "example_chat_completion.py" calls "chat_completion" from "generation.py" which in turn calls "encode_dialog_prompt". And Dialog is list of Json/dictionary with role and content.

No such example inference code has been provided with 3.1 models

Viasnow commented 3 months ago

I just got off work I'm going to clean up and I will go thru your format

On Tue, Aug 13, 2024, 4:13 PM seg-aiaec @.***> wrote:

When can we expect an update on this? On this prompt format page https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/ they say " Note that although prompts designed for Llama 3 should work unchanged in Llama 3.1"

Would that mean those example scripts would work the same? or not?

The "example_chat_completion.py" calls "chat_completion" from "generation.py" which in turn calls "encode_dialog_prompt". And Dialog is list of Json/dictionary with role and content.

— Reply to this email directly, view it on GitHub https://github.com/meta-llama/llama-models/issues/82#issuecomment-2287148010, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ6UO367QCGS7ELYMQ2SHSTZRJZIPAVCNFSM6AAAAABLZQANFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBXGE2DQMBRGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

HamidShojanazeri commented 3 months ago

thanks @karpathy for opening this issue, we are targeting llama-stack as our preferred path for inference, please take a look at that, we would also appreciate your feedback on the RFC as well.

To provide alternatives, we are also updating the instructions to run our previous style of example as well soon. This PR should solve the issue.