msc-acse / acse-9-independent-research-project-Garethlomax

acse-9-independent-research-project-Garethlomax created by GitHub Classroom
0 stars 2 forks source link

Issue with large tensors being stored in memory - unsure of origins #24

Open Garethlomax opened 5 years ago

Garethlomax commented 5 years ago

Running snippet to check garbage collection shows a number of large (4096, 4096) tensors, without a good explanation of origin. May either be temporary which have noit been deleted a too costly during training, or be due to a memory leak.

Useful: https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/12

From above : Update 2: "Finally I solved the memory problem! I realized that in each iteration I put the input data in a new tensor, and pytorch generates a new computation graph. That causes the used RAM to grow forever. Then I use a placeholder tensor and copy the data to this tensor, and the RAM always stays at a low level :smile:"

Garethlomax commented 5 years ago

Relevant : https://github.com/pytorch/pytorch/issues/2198

Garethlomax commented 5 years ago

Issue appears to be as a result of pytorch's computational tree structure for back propagation. need to detatch tensor to allow to bypass hidden memory issues

Garethlomax commented 5 years ago

for debugging a tool to visualize the computational graph of the lstm:

https://github.com/szagoruyko/functional-zoo/blob/master/visualize.py

https://github.com/szagoruyko/pytorchviz

Garethlomax commented 5 years ago

https://discuss.pytorch.org/t/solved-why-we-need-to-detach-variable-which-contains-hidden-representation/1426

Garethlomax commented 5 years ago

http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/

Garethlomax commented 5 years ago

useful cudnn: https://blog.paperspace.com/pytorch-memory-multi-gpu-debugging/

Garethlomax commented 5 years ago

progress made with cudnn memory - will still explore detatch approaches at a later date. Also will look into garbage collection inneficiencies.

relevant : https://discuss.pytorch.org/t/the-pack-sequence-recurrent-network-unpack-sequence-pattern-in-a-lstm-training-with-nn-dataparallel/23260

Garethlomax commented 5 years ago

gradient averaging is another potential remedy: https://gchlebus.github.io/2018/06/05/gradient-averaging.html

Garethlomax commented 5 years ago

truncated backprop:

https://docs.chainer.org/en/stable/examples/rnn.html