Closed shub-kris closed 6 months ago
I have added TRL, PEFT and used Dolly-15k.
With the setup mentioned in README, I was able to run the training in 2 minutes and 30 seconds.
cd /workspace
python google-partnership/Google-Cloud-Containers/examples/google-cloud-tpu-vm/causal-language-modeling/peft_lora_trl_dolly_clm.py \
--model_id facebook/opt-350m \
--num_epochs 3 \
--train_batch_size 8 \
--num_cores 8 \
--lr 3e-4
@philschmid running with Llama 7B
will require a bigger machine and I am testing that currently as with TPU: v5-litepod-8
runs OOM.
So for now, we can merge this PR along with the Dockerfile mentioned in other PR: https://github.com/huggingface/Google-Cloud-Containers/pull/14
I will open a separate PR where I would add an example to work with LLama-7B as it requires setting up a VM with multiple hosts: v5-litepod-16
and for that the steps to execute is different.
Merged into PR #14
This PR adds an example for our PyTorch TPU container. The README will be updated later once the DLCs are released. For now it mentions the steps that I followed to build and test it.
The example is training BERT for emotion classification. This example is based on pytorch-xla test