karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
23.27k stars 2.59k forks source link

examples for popular models #1

Open ehartford opened 5 months ago

ehartford commented 5 months ago

I would like to request 1 or 2 examples of how to adapt this for a popular open models, such as:

https://huggingface.co/mistralai/Mistral-7B-v0.1 https://huggingface.co/meta-llama/Llama-2-7b-hf https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T https://huggingface.co/microsoft/phi-2

I think if there's one or two example adapters, the community could contribute to a list of adapters that enable different models.

ewof commented 5 months ago

from the readme

Currently, I am working on:

  • more modern architectures, e.g. Llama2, Gemma, etc.

probably gonna take some time

karpathy commented 5 months ago

GPT-2 is not far away from these SOTA models at all. The most complex new layer that is needed is probably RoPE, and even that is not too complex. My earlier project llama2.c already has the forward pass implementation, it just has to be batched and adapted. And then changing around details (taking out positional encoder, layernorm -> rmsnorm, new sizing hyperparameters, etc.) and you have yourself a train_llama2.c.

karpathy commented 5 months ago

Oh one thing I'll say is that this being fp32, you probably don't want to try to work with anything much larger than ~1B. The smallest Llama 2 sadly is 7B, which is very large for fp32. So I'd mostly suggest using up to TinyLlama 1.1B. Maybe gemma 2B.