Closed echatzikyriakidis closed 1 year ago
Hi @echatzikyriakidis, managing CUDA memory is indeed challenging. You can try deleting the model itself (and all its references); example,
trainer = rtf_model.fit(...)
del rtf_model.model
try:
del trainer.model
del trainer.model_wrapped
except Exception:
pass
torch.cuda.empty_cache()
Still, I am unsure if this will solve your problem. If this does not work, you may need to just save each previous model, restart the session, and rerun with a new model.
Hi @avsolatorio !
Thank you for your help!
I understand and I will sure try it in Colab.
However, what if I use a python script locally that will eventually train the models and then terminate, releasing all memory occupied by the operating system. In such scenario I don't have a runtime session like Colab that needs to be terminated or restarted. Doesn't mean that the GPU memory will automatically be released? When the script program does any rtf_model and even the realtabformer module will be unloaded. I suppose that the GPU memory will also be released. Right?
Hello @echatzikyriakidis, yes, that is the expected behavior. The GPU memory should be freed up as soon as the script terminates.
@avsolatorio Thank you!
Hi @avsolatorio,
I am training multiple tabular and relational models sequentially in a single Colab Notebook with GPU runtime (I have Google Colab Pro+) and I experience CUDA out of memory error after some time in one of my models. How can I use dispose/release a REalTabFormer model after its training to free GPU memory occupied?
Thanks!