massive CUDA memory leak

rdinnager commented 2 years ago

Stylegan3 Generator can only be used 2 times in an R session when using CUDA, because memory it uses is not released from the GPU (and it uses a lot of memory each time). The CPU version does not have the same issue.

This may be related to https://github.com/mlverse/torch/issues/403, because the CUDA version uses custom cuda ops, which are wrapped into an autograd_function(). The CPU version instead uses only standard torch operations. This is currently pretty frustrating becausei the CUDA version is very much faster than the CPU version (and I spent a fair amount of time getting those CUDA extensions to work!).

pinging @dfalbel . An example autograd_function that wraps a custom CUDA operation here: https://github.com/rdinnager/styleganr/blob/a0c108c3c5dc67a85c08e1300f0a3e9336194e23/R/upfirdn2d.R#L207-L264

rdinnager commented 2 years ago

Another important point is that after I run the Generator a few times and the GPU memory is almost full, running it again causes the R session to crash immediately rather than getting a CUDA memory exhausted error (which is usually what happens when using torch more generally).

dfalbel commented 2 years ago

This is definitely related to #422! Sorry, I'll to try work on it as soon as possible!

One possibility would be to port that autograd function to C++, for example what we do here: https://github.com/mlverse/torch/blob/master/lantern/src/Contrib/Sparsemax.cpp

This way you can avoid the R wrapper that is currently buggy and gain a few CPU clocks :)

rdinnager commented 2 years ago

Hmm.. yes, this is worth considering. Another opportunity to learn more C++ I suppose!

rdinnager commented 2 years ago

Just confirmed that the fix for https://github.com/mlverse/torch/issues/403 fixed this issue. Closing.

rdinnager / styleganr

massive CUDA memory leak #1