pytorch / torchtune

A Native-PyTorch Library for LLM Fine-tuning
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
3.93k stars 354 forks source link

torch.compile support #1187

Open youngsheen opened 1 month ago

youngsheen commented 1 month ago

Hi, I notice a compile flag in the single device training script, which does not exist in the distributed training script though. Does distributed mode support it?

pbontrager commented 1 month ago

Distributed can work with compile, but there are some edge cases where the two weren't playing well together so we haven't included it in the distributed recipes yet. @kartikayk might be able to add more information on the development of compile + distributed support.

kartikayk commented 1 month ago

@youngsheen As @pbontrager mentioned, there are some composability challenges between FSDP1 and torch compile. We recently landed support for FSDP2 for LoRA and QLoRA and will be working on updating the other distributed recipes to leverage FSDP2 as well. This will make compile support a lot easier.

cc: @ebsmothers who's working on the FSDP2 support and @yf225 who's helping enable compile support in torchtune