Closed justadudewhohacks closed 5 years ago
Thanks for investigating this. Is there a chance you could share your repo with us? In the meantime, I'll try to reproduce this based on your pointers.
Not sure if it helps, but here is the tiny-yolov2-seperable-conv2d branch I am currently working on, which is where I am facing the issue.
Basically the first issue occurs when backpropagating through the tiny yolov2 implementation with separable convolutions. The code for training is under /tools/train/tinyYolov2
.
The second issue occurs when backpropagating through the face landmark net (/tools/train/faceLandmarks
).
I will try to come up with a repo with some simpler example code to reproduce the issue, as soon as I have time, which might be simpler to debug.
Okay after setting up an example repo to reproduce the issue, I figured out, that the issue is not related to tf.separableConv2d
, but it's caused by tf.maximum
.
Example repo is here.
Okay after spending some more time on this problem I figured out, that using tf.maximum only messes with the numBytes counter, as shown in the screenshot. Apparently it doesn't cause the memory leak that I am facing.
I found what's causing the memory leak and opened a seperate issue for that: #604
Closing since this got fixed.
TensorFlow.js version
Browser version
Describe the problem or feature request
There seems to be some memory leak in
optimizer.minimize
, no matter if I use adam or sgd:After some time, chrome memory usage grows higher than multiple GB. Logging
tf.memory()
furthermore reveals some strange decrementing ofnumBytes
(tracked memory of tensors in RAM I guess?):Just to point out, the leak isn't occuring due to
net.forwardInput
ortf.sum
, since the following code runs without any leaks:Edit.:
Some more clarification: The net is a combination of separableConv2d's + max pooling ops, with a single 1x1 convolution at the end. The output of net.forwardInput in the example is a 1x13x13x25 tensor.
It might be, that the issue is due to backpropagation through separableConv2d's.
I also ran the abovementioned example with a different net, which consists of conv2d's + max pooling ops, producing a 1x136 output tensor, coming up with different results for tfjs-core 0.11.9 and tfjs-core 0.12.9, running the exact same code:
tfjs-core 0.11.9: works fine, without any leaks
tfjs-core 0.12.9: crashes in the first iteration causing chromes memory to quickly rise above 3GB