silvandeleemput / memcnn

PyTorch Framework for Developing Memory Efficient Deep Invertible Networks
MIT License
251 stars 26 forks source link

Memory demand not independent of depth because ctx.saved_tensors not freed? #53

Open cetmann opened 4 years ago

cetmann commented 4 years ago

Hi! I did some benchmarking recently and found that the memory demand is not quite independent of the depth, see Table 3 on the last page on https://arxiv.org/abs/2005.05220 My suspicion is that operations in the initial forward pass still save tensors that are usually necessary for the backpropagation via ctx.save_for_backward(..), which are then stored in ctx.saved_tensors. Can you confirm that this is what's happening and if so, is there a way of freeing this memory as well? These saved tensors should never be needed when training with memory-efficient invertible backpropagation, as they are re-created when activations are reconstructed in the backward pass. Best, Christian

silvandeleemput commented 4 years ago

@cetmann Hi thanks for letting me know. I am currently on vacation, but I'll have a look at it once I am back.

silvandeleemput commented 4 years ago

Hi @cetmann, I have had a look at the table from your paper. Memory consumption during training is a bit complicated both to explain and to measure and requires some elaboration.

As you already have identified the memory savings achieved by the MemCNN couplings do not account for the model parameters (no memory savings when increasing model depth, unless you use weight sharing or something), but only for the activations of the feature maps during training. The latter entails that only the last activation has to be stored, such that it becomes independent of depth with memory complexity O(1). As you noted this still accounts for the majority of the memory used during training and should result in significant memory savings. In conclusion, the memory consumption for the couplings is independent of model depth for only the activations (since you need not store them) and not (necessarily) for the parameters during model training.

Measuring activations and model parameters memory usage can be done as follows:

Also, measuring memory consumption in PyTorch is tricky. Don't rely on nvidia-smi for the statistics, since PyTorch uses a caching allocator, hence nvidia-smi might often show more memory being used than is actually allocated. Instead, use the torch.cuda.memory_allocated method in your code to get the actual allocated memory on a GPU device. See also https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management

I hope this clarifies your findings, if you still have questions please let me know.