Open ehartford opened 5 months ago
from the readme
Currently, I am working on:
- more modern architectures, e.g. Llama2, Gemma, etc.
probably gonna take some time
GPT-2 is not far away from these SOTA models at all. The most complex new layer that is needed is probably RoPE, and even that is not too complex. My earlier project llama2.c already has the forward pass implementation, it just has to be batched and adapted. And then changing around details (taking out positional encoder, layernorm -> rmsnorm, new sizing hyperparameters, etc.) and you have yourself a train_llama2.c
.
Oh one thing I'll say is that this being fp32, you probably don't want to try to work with anything much larger than ~1B. The smallest Llama 2 sadly is 7B, which is very large for fp32. So I'd mostly suggest using up to TinyLlama 1.1B. Maybe gemma 2B.
I would like to request 1 or 2 examples of how to adapt this for a popular open models, such as:
https://huggingface.co/mistralai/Mistral-7B-v0.1 https://huggingface.co/meta-llama/Llama-2-7b-hf https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T https://huggingface.co/microsoft/phi-2
I think if there's one or two example adapters, the community could contribute to a list of adapters that enable different models.