What does this PR do?

Prototype implementation for porting from FSDP V1 to FSDP V2. There are couple of open questions in this PR that would need comments and discussion.

Do we want to maintain FSDP V1 as is and add a experimental parallel to FSDP V2?
When we want to maintain 2 versions, should we maintain separate FSDP plugins and distributed types for each versions?
For HF/transformers users, using fsdp_config, how we want to allow them to choose between these versions?
How we want prepare 2D mesh for HSDP, should that be an input from user?

The current version of the PR has been tested for basic functionality (full shard) and compared with previous FSDP V1 implementation.

TODO

Fixes #2873

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

@muellerzr