Adding FSDP Memory Tracking and Estimation

Previous conversation:

The change itself seems good to me. I wonder what the approach will be in the future if train.py continues to change though.

I think the best way would be to incorporate this into train.py directly and use the estimate config options to enable and disable the right parts of the code in the main workflow. That way we don't have to maintain two copies.
On another note what @gnadathur suggested and seems pretty reasonable is, we want the estimate.py to evolve into an option that auto configures stuff and outputs a configuration to run.
For now I replicated some things because I haven't got user feedback for what they want. @lessw2020 is going to advertise this tool to partner teams, who may give us feedback about how they want to use it.

I am open to other suggestions as well.

cc: @awgu @tianyu-l

pytorch / torchtitan