pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.33k stars 436 forks source link

torch.compile support #1187

Open youngsheen opened 4 months ago

youngsheen commented 4 months ago

Hi, I notice a compile flag in the single device training script, which does not exist in the distributed training script though. Does distributed mode support it?

pbontrager commented 4 months ago

Distributed can work with compile, but there are some edge cases where the two weren't playing well together so we haven't included it in the distributed recipes yet. @kartikayk might be able to add more information on the development of compile + distributed support.

kartikayk commented 4 months ago

@youngsheen As @pbontrager mentioned, there are some composability challenges between FSDP1 and torch compile. We recently landed support for FSDP2 for LoRA and QLoRA and will be working on updating the other distributed recipes to leverage FSDP2 as well. This will make compile support a lot easier.

cc: @ebsmothers who's working on the FSDP2 support and @yf225 who's helping enable compile support in torchtune