Open RobotSail opened 5 months ago
If we do this, then all the checkpointing flows need to be retested. Also not sure what is the impact on #25
Yes, we shouldn't do this until after the 15th. But probably something we should eventually support if we'd like NVMe offloading.
This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.
Today we hardcode options specific to ZeRO stage 2. We should update our implementation to allow support for ZeRO stage 1 and 3 as well.