Add a warning when using FSDP full shard with gradient_checkpointing training arg to encourage users to use fsdp config's activation_checkpointing instead.
Fixes #30404
Before submitting
[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SunMarc @muellerzr
What does this PR do?
Add a warning when using FSDP full shard with
gradient_checkpointing
training arg to encourage users to use fsdp config'sactivation_checkpointing
instead.Fixes #30404
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. @SunMarc @muellerzr