arXiv '16 | Training Deep Nets with Sublinear Memory Cost

mental2008 / awesome-papers

Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and other interesting stuffs).

MIT License

38 stars 2 forks source link

Problem

How to reduce the memory consumption of DNN training (to enable bigger models or larger batch size)?

Solution

Mainly focus on reducing the memory cost to store intermediate results (feature maps) and gradients.
Design an algorithm to trade computation for memory. O(√n) memory cost with one extra forward computation per mini-batch.
- Inplace operation: directly store the output values to memory of a input value.
- Memory sharing: memory used by intermediate results that are no longer needed can be recycled and used in another node.
- Re-computation: drop the results of low cost operations and re-compute the dropped intermediate results.

Guidelines for DL frameworks

Enable option to drop result of low cost operations.
Provide planning algorithms to give efficient memory plan.
Enable user to set the mirror attribute (how many times a result can be recomputed) in the computation graph for memory optimization.

mental2008 / awesome-papers

arXiv '16 | Training Deep Nets with Sublinear Memory Cost #68

Problem

Solution

Guidelines for DL frameworks