pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.35k stars 440 forks source link

Grad clipping and warmup #1992

Closed tginart closed 1 week ago

tginart commented 1 week ago

Love the repo. Easy to use and works well with SLURM.

Is there any chance we could get grad clipping and warmup? These are really standard for training and fine-tuning.

Thanks!

RdoubleA commented 1 week ago

We do support gradient clipping, see the Llama 3.2 Vision config for how to set it: https://github.com/pytorch/torchtune/blob/main/recipes/configs/llama3_2_vision/11B_full.yaml#L67

This isn't very visible in our documentation or our other configs, so we should amend that. I will file an issue for this.

For warmup, we support it for all recipes except full_finetune_distributed, which I believe is in progress to add this support. You can see this Gemma config as an example for how to configure warmup: https://github.com/pytorch/torchtune/blob/main/recipes/configs/gemma/2B_lora.yaml#L60