Closed kfertakis closed 3 days ago
Is there another API available for releasing the allocated GPU memory without having to kill the process?
Hi @kfertakis, Yes, there is an explicit API to free up engine resources. You can call model_engine.destroy()
to reclaim the allocated memory.
Thank you for the reference. This does the job.
Describe the bug Initialising a trainable model with deepspeed and then deleting the engine leaves GPU memory still allocated.
To Reproduce Running the following simple test script shows that GPU memory remains allocated even after all references to the deepspeed engine are deleted.
Expected behavior GPU memory should be freed when a deepspeed engine gets deleted. Is there another API available for releasing the allocated GPU memory without having to kill the process? Thanks
System info:
Launcher context
deepspeed --num_gpus=1 --master_port 12346 test_deepspeed.py