Open youngsheen opened 4 months ago
Distributed can work with compile, but there are some edge cases where the two weren't playing well together so we haven't included it in the distributed recipes yet. @kartikayk might be able to add more information on the development of compile + distributed support.
@youngsheen As @pbontrager mentioned, there are some composability challenges between FSDP1 and torch compile. We recently landed support for FSDP2 for LoRA and QLoRA and will be working on updating the other distributed recipes to leverage FSDP2 as well. This will make compile support a lot easier.
cc: @ebsmothers who's working on the FSDP2 support and @yf225 who's helping enable compile support in torchtune
Hi, I notice a compile flag in the single device training script, which does not exist in the distributed training script though. Does distributed mode support it?