Open duncantech opened 1 month ago
/assigntome
@duncantech I am thinking of training the latest Gemma model, indeed, with Pytorch XLA. Is it okay then?
I think the geema model should work out of box. Take a look at https://github.com/google/gemma_pytorch#try-it-out-with-pytorchxla. Feel free to give it a try and see if we can improve anything.
I think the geema model should work out of box. Take a look at https://github.com/google/gemma_pytorch#try-it-out-with-pytorchxla. Feel free to give it a try and see if we can improve anything.
Ok. I will look into the gemma part.
For a different model I am trying with, a few things I need to know: do I need to use any free cloud tpu provider, for example, Kaggle or Colab tpu, or is it necessary to do it with the v5 in Google Cloud?
That part I think @duncantech can answer.
You can work with a free TPU provider if you'd like to get things started.
We should also be able to give a small amount of v5es to try with too.
@sitamgithub-MSIT we haven't heard an update in a bit and just wondering if you're still working on the issue?
@sitamgithub-MSIT we haven't heard an update in a bit and just wondering if you're still working on the issue?
Yes I am working to it. I am checking this example in the hugging face for Gemma. I am thinking about reproducing the same for CodeGemma, though.
@duncantech I am preparing a script to run in tpus. So as I am using Codegema, it comes with 7b parameters, so it will not fit in Colab unless we use a 4-bit version of that. So should I use the bits and bytes configuration for that? Or should I just train it in the cloud and see if everything works?
You can try with the 4-but version and see what the performance is like since that would be easier for others to run in the future!
📚 Documentation
Using PyTorch/OpenXLA select a model, and try to get it training and running on Cloud TPUs, create a tutorial on how you went about doing it.