Open DimensionSTP opened 3 days ago
Hi @DimensionSTP ! We do not support Llama 3.1 or 3.2 yet, but we should add that support before the end of the year. Having said that, if all you want is to fine-tune these models, you can probably just follow the example steps in our Llama fine tuning example and it should work (though this is untested yet). For serving/inference you would still need to a better support for sharding, but for fine-tuning it should be fine.
Hello,
I am deeply interested in your Optimum-TPU project. Currently, I am planning to fine-tune the Llama 3.1 and 3.2 models on my native language and a specific domain, with a fairly large dataset (approximately 60B tokens). I am using Google TPU Pods, but I have been facing significant challenges in implementing model parallel training from scratch, saving unified checkpoints in the safetensors format, setting up appropriate logging, and configuring hyperparameters.
While exploring solutions, I came across the Optimum-TPU project, which seems incredibly useful. However, I noticed that it currently only supports up to Llama 3. Are there any plans to extend support to Llama 3.1 and 3.2 for fine-tuning? I strongly hope that future updates will include support for these versions as well.
Thank you for considering this request!