microsoft / mup

maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.37k stars 94 forks source link

Support FSDP usage #72

Open janEbert opened 5 months ago

janEbert commented 5 months ago

We now manually copy mu_readout.weight_infshape = mu_readout.weight.infshape after setting base shapes. This way, we can still access the infshape after FSDP-wrapping. Because this also requires using FSDP(..., use_orig_params=True), the README is accordingly adjusted to mention this caveat.

Fix #59. @edwardjhu is the review offer still up? :)

janEbert commented 5 months ago

@microsoft-github-policy-service agree