GPU CUDA: Out Of Memory when training many models

worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

https://worldbank.github.io/REaLTabFormer/

MIT License

203 stars 23 forks source link

GPU CUDA: Out Of Memory when training many models #14

Closed echatzikyriakidis closed 1 year ago

echatzikyriakidis commented 1 year ago

Hi @avsolatorio,

I am training multiple tabular and relational models sequentially in a single Colab Notebook with GPU runtime (I have Google Colab Pro+) and I experience CUDA out of memory error after some time in one of my models. How can I use dispose/release a REalTabFormer model after its training to free GPU memory occupied?

Thanks! Screenshot from 2023-03-29 01-13-401 (1)

avsolatorio commented 1 year ago

Hi @echatzikyriakidis, managing CUDA memory is indeed challenging. You can try deleting the model itself (and all its references); example,

trainer = rtf_model.fit(...)

del rtf_model.model
try:
    del trainer.model
    del trainer.model_wrapped
except Exception:
    pass

torch.cuda.empty_cache()

Still, I am unsure if this will solve your problem. If this does not work, you may need to just save each previous model, restart the session, and rerun with a new model.

echatzikyriakidis commented 1 year ago

Hi @avsolatorio !

Thank you for your help!

I understand and I will sure try it in Colab.

However, what if I use a python script locally that will eventually train the models and then terminate, releasing all memory occupied by the operating system. In such scenario I don't have a runtime session like Colab that needs to be terminated or restarted. Doesn't mean that the GPU memory will automatically be released? When the script program does any rtf_model and even the realtabformer module will be unloaded. I suppose that the GPU memory will also be released. Right?

avsolatorio commented 1 year ago

Hello @echatzikyriakidis, yes, that is the expected behavior. The GPU memory should be freed up as soon as the script terminates.

echatzikyriakidis commented 1 year ago

@avsolatorio Thank you!