huggingface / optimum-tpu

Google TPU optimizations for transformers models
Apache License 2.0
75 stars 19 forks source link

Support for Llama 3.1 and 3.2 fine tuning #114

Open DimensionSTP opened 3 days ago

DimensionSTP commented 3 days ago

Hello,

I am deeply interested in your Optimum-TPU project. Currently, I am planning to fine-tune the Llama 3.1 and 3.2 models on my native language and a specific domain, with a fairly large dataset (approximately 60B tokens). I am using Google TPU Pods, but I have been facing significant challenges in implementing model parallel training from scratch, saving unified checkpoints in the safetensors format, setting up appropriate logging, and configuring hyperparameters.

While exploring solutions, I came across the Optimum-TPU project, which seems incredibly useful. However, I noticed that it currently only supports up to Llama 3. Are there any plans to extend support to Llama 3.1 and 3.2 for fine-tuning? I strongly hope that future updates will include support for these versions as well.

Thank you for considering this request!

tengomucho commented 3 days ago

Hi @DimensionSTP ! We do not support Llama 3.1 or 3.2 yet, but we should add that support before the end of the year. Having said that, if all you want is to fine-tune these models, you can probably just follow the example steps in our Llama fine tuning example and it should work (though this is untested yet). For serving/inference you would still need to a better support for sharding, but for fine-tuning it should be fine.