xuebinqin / U-2-Net

The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
Apache License 2.0
8.64k stars 1.49k forks source link

A CUDA memory leak on models defined in u2net_refactor.py #336

Open chmendoza opened 2 years ago

chmendoza commented 2 years ago

Hi,

I decided to start using u2net_refactor first because it has a more general definition that allows for a more flexible design, which is appealing. However, I keep seeing a gradual increase of CUDA memory usage after each epoch, which is appalling. The memory leak happens after calling model.train(). I don't have the time now to debug this further, but if I have to take a guess, I would say that probably this is the offending line: https://github.com/xuebinqin/U-2-Net/blob/53dc9da026650663fc8d8043f3681de76e91cfde/model/u2net_refactor.py#L106...or something in the recursive nature of the unet() function inside the forward method...there is something in the definition of the models that is accumulating gradient history epoch after epoch.

I switched back to the original code in u2net.py and the CUDA memory leak disappeared.