I decided to start using u2net_refactor first because it has a more general definition that allows for a more flexible design, which is appealing. However, I keep seeing a gradual increase of CUDA memory usage after each epoch, which is appalling. The memory leak happens after calling model.train(). I don't have the time now to debug this further, but if I have to take a guess, I would say that probably this is the offending line: https://github.com/xuebinqin/U-2-Net/blob/53dc9da026650663fc8d8043f3681de76e91cfde/model/u2net_refactor.py#L106...or something in the recursive nature of the unet() function inside the forward method...there is something in the definition of the models that is accumulating gradient history epoch after epoch.
I switched back to the original code in u2net.py and the CUDA memory leak disappeared.
Hi,
I decided to start using u2net_refactor first because it has a more general definition that allows for a more flexible design, which is appealing. However, I keep seeing a gradual increase of CUDA memory usage after each epoch, which is appalling. The memory leak happens after calling model.train(). I don't have the time now to debug this further, but if I have to take a guess, I would say that probably this is the offending line: https://github.com/xuebinqin/U-2-Net/blob/53dc9da026650663fc8d8043f3681de76e91cfde/model/u2net_refactor.py#L106...or something in the recursive nature of the
unet()
function inside the forward method...there is something in the definition of the models that is accumulating gradient history epoch after epoch.I switched back to the original code in u2net.py and the CUDA memory leak disappeared.