Closed tginart closed 1 week ago
We do support gradient clipping, see the Llama 3.2 Vision config for how to set it: https://github.com/pytorch/torchtune/blob/main/recipes/configs/llama3_2_vision/11B_full.yaml#L67
This isn't very visible in our documentation or our other configs, so we should amend that. I will file an issue for this.
For warmup, we support it for all recipes except full_finetune_distributed
, which I believe is in progress to add this support. You can see this Gemma config as an example for how to configure warmup: https://github.com/pytorch/torchtune/blob/main/recipes/configs/gemma/2B_lora.yaml#L60
Love the repo. Easy to use and works well with SLURM.
Is there any chance we could get grad clipping and warmup? These are really standard for training and fine-tuning.
Thanks!