Open RobotSail opened 1 month ago
If we do this, then all the checkpointing flows need to be retested. Also not sure what is the impact on #25
Yes, we shouldn't do this until after the 15th. But probably something we should eventually support if we'd like NVMe offloading.
Today we hardcode options specific to ZeRO stage 2. We should update our implementation to allow support for ZeRO stage 1 and 3 as well.