Open philschmid opened 1 year ago
That would be awesome 🥇
Hello,
Are you planning to add support for LLaMA 2 to further pretrain the models?
I know 7B and 13B should have the same architecture, would be good if you can confirm that it works. Also if there are plans for the 70B (GQA).
+1
Indeed this would be useful. Let me look into that.
I have implemented a version of that but I haven't checked that yet I used the same architecture as EasyLM in some parts https://github.com/erfanzar/EasyDeL/blob/main/EasyDel/modules/llama/modelling_llama_flax.py
Has anyone tried implementing further pre-training in Flax/JAX to run it on TPU ?
Hello,
Are you planning to add support for LLaMA 2 to further pretrain the models?