LLaMA 2 support for pre-training

young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Apache License 2.0

2.33k stars 247 forks source link

Open philschmid opened 1 year ago

philschmid commented 1 year ago

Hello,

Are you planning to add support for LLaMA 2 to further pretrain the models?

juliensalinas commented 1 year ago

That would be awesome 🥇

philschmid commented 1 year ago

Hello,

Are you planning to add support for LLaMA 2 to further pretrain the models?

I know 7B and 13B should have the same architecture, would be good if you can confirm that it works. Also if there are plans for the 70B (GQA).

windmaple commented 1 year ago

young-geng commented 1 year ago

Indeed this would be useful. Let me look into that.

erfanzar commented 11 months ago

I have implemented a version of that but I haven't checked that yet I used the same architecture as EasyLM in some parts https://github.com/erfanzar/EasyDeL/blob/main/EasyDel/modules/llama/modelling_llama_flax.py

iliemihai commented 11 months ago

Has anyone tried implementing further pre-training in Flax/JAX to run it on TPU ?