Closed anshulverma closed 4 days ago
Note: Links to docs will display an error until the docs builds have been completed.
As of commit 304da4aa27d832c1c418556288bbe88a3e00b72f with merge base 4389b4d81398da0890aa686ef38cc15c898e2036 ():
* [GPU tests / gpu_test (3.10, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611742077) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400748/job/32611742077)) `tests/torchtune/training/test_distributed.py::TestLoRAFSDP::test_lora_fsdp_wrap` * [GPU tests / gpu_test (3.11, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611742488) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400748/job/32611742488)) `tests/torchtune/training/test_distributed.py::TestLoRAFSDP::test_lora_fsdp_wrap` * [Lint / lint (3.10)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611742542) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400753/job/32611742542)) `##[error]Process completed with exit code 1.` * [Unit Test / unit_tests (3.11)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611742859) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400779/job/32611742859)) `tests/torchtune/training/test_distributed.py::TestLoRAFSDP::test_lora_fsdp_wrap`
* [GPU tests / gpu_test (3.9, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611741458) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400748/job/32611741458)) `tests/torchtune/training/test_distributed.py::TestLoRAFSDP::test_lora_fsdp_wrap` * [Unit Test / unit_tests (3.10)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611742285) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400779/job/32611742285)) `##[error]The operation was canceled.` * [Unit Test / unit_tests (3.9)](https://hud.pytorch.org/pr/pytorch/torchtune/1956#32611741791) ([gh](https://github.com/pytorch/torchtune/actions/runs/11699400779/job/32611741791)) `##[error]The operation was canceled.`
This comment was automatically generated by Dr. CI and updates every 15 minutes.
This pull request was exported from Phabricator. Differential Revision: D65496443
hey @anshulverma, we used to have fsdp1 and now we use fsdp2. I don't think that we have ever seen any warning complaining about reset_parameters. Can you help me understand the context for this PR? Did something break or you found some error when running it?
@anshulverma I am gonna close this PR. I may be missing the point (and if so feel free to reopen) but it seems to me that we should never have any problems here because we are always loading in RMSNorm scales as part of the pretrained checkpoint. So we will never run into any issues with garbage initialization via e.g. usage of to_empty
without proper initialization
Summary: When parameters are initialized with meta device, FSDP calls
reset_parameters
function automatically ifparam_init_fn
is not specified. For more details, see wikiDifferential Revision: D65496443