Open jeromeku opened 2 months ago
@jeromeku, you can start here: https://www.deepspeed.ai/tutorials/zeropp/
@tjruwase
Is it possible partition parameters using the secondary partition for both forward and backwards? That is, only shard intra-node for both forwards and backwards instead of only for backwards?
Can this be accomplished given hpZ, and if so, what would be the appropriate config?
Thanks!
Can this be accomplished given hpZ, and if so, what would be the appropriate config?
No, this is not possible in hpZ.
@tjruwase Are there any benchmarks comparing ZeRO++ hpZ with MiCS? Are there specific use cases for one over the other given the different partitioning schemes employed by hpZ vs MiCS?
@jeromeku, please see the attached performance comparison of hpZ versus MiCS. Generally, hpZ is more memory efficient because, unlike MiCS, it does not replicate the entire model state. However, MiCS might be competitive in scenarios where memory is not a bottleneck.
Is your feature request related to a problem? Please describe. I'm interested in hybrid FSDP where the model is replicated across nodes and sharded within node.
My understanding is that this can be achieved through MiCS and / or ZeRO++ hpZ.
Describe the solution you'd like Better documentation, examples, or tutorials on how these solutions differ and how to best compose these features with Zero3 for a given network topology.