Open stas00 opened 3 years ago
I think these things that you mentioned makes sense. We also need to make sure such freeing or not allocating the memory for those training-related parts not happen in an automatic way when we are at inference mode. I mean that user don't need to specifically call a function like __free_optimizer_and_scheduler to free those memory, but have an easy way of switching mode like eval__ mode in PyTorch.
I agree! That would be nice indeed.
But torch's model.eval()
/train()
just turns some flags on/off, how would you deal with the user switching back from eval to train in deepspeed? Do you save the config and simply re-init the parts that were freed for eval
.
yes, that can be a viable option, as we have to eventually control the checkpointing through deepspeed if we want to seamlessly switch between these two modes. I think we can hide all the necessary operations for the switching between inference and training modes, and the user still feels it's like switching a flag on and off.
yes, please!
While https://github.com/microsoft/DeepSpeed/pull/896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.
Here is one way to approach this:
with a new deepspeed method:
That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.
Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.
Thank you.
@jeffra, @RezaYazdaniAminabadi