nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.
Apache License 2.0
146 stars 73 forks source link

Optimized topological sorting for the graph #1126

Open kparichay opened 3 years ago

kparichay commented 3 years ago

Given a graph with multiple independent paths between two nodes, topological sort can return multiple solutions. All these solutions are valid by themselves. However, they can result in different peak and average memory consumption.

Consider the example below graph (character represents node in the graph): resnet_bottleneck_block *the image has been borrowed from https://arxiv.org/pdf/1812.01187v2.pdf.

Assume the Input requires 100units of memory (T1 = 100).

Now, consider two topological sorts:

  1. L -> M -> N -> P
  2. L -> M -> P -> N

Both the above sorts are valid. However, the peak memory requirements are different for both the sorting. Peak memory requirements for Sort 1 is 300, while it's only 225 for Sort 2. Note: this is only for inference. Training can have very different memory requirements.

Note that this explains only 1 case in ResNet architecture, and there are more cases in ResNet itself. There are many more scenarios:

Solution

Need to find the topological sort which reduces the peak memory consumption, given a model graph (can start by optimizing for specific models and later, optimize for a generic graph) and mode of execution (inference and training).

Calculation Notes:

Calculating peak memory requirements for Sort 1: Node operating Tensors to store Memory requirement Peak mem
L T1, T2 100 + 100 200
M T1, T2, T3 100 + 100 + 25 225
N T1, T3, T4 100 + 25 + 100 225
P T1, T5, T4 100 + 100 + 100 300
Add T4, T5, T6 100 + 100 + 0 300
Calculating peak memory requirements for Sort 2: Node operating Tensors to store Memory requirement Peak mem
L T1, T2 100 + 100 200
M T1, T2, T3 100 + 100 + 25 225
P T1, T5, T3 100 + 100 + 25 225
N T5, T3, T4 100 + 25 + 100 225
Add T4, T5, T6 100 + 100 + 0 225
taos-ci commented 3 years ago

:octocat: cibot: Thank you for posting issue #1126. The person in charge will reply soon.

lhs8928 commented 3 years ago

[Report] Peak memory of resnet50

Analyze of torchvision resnet50 model peak memory consumption (refer: https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py)

In resnet there are 2 types of bottleneck which contains a downsample or not. In order to reduce peak memory by reorder the layer in forwarding, the bottleneck which contains a downsample layer is only concerned. And in resnet50, there are 4 bottlenecks that contains a downsample layer.

By reordering we can reduce memory consumption during processing the forward pass except first bottleneck. But the memory consumption of first bottleneck is bigger than the rest bottleneck so it seems that peek memory will not reduce even though reordering the layer in forward process.

jijoongmoon commented 3 years ago

I think even if we cannot reduce the peak memory for resnet50, this kind of optimization is always required. It definitely has meaning. We can reduce the memory consumption at certain time of the inference. We also have to calculate during training as well.

kparichay commented 3 years ago

@lhs8928 Thanks for your help and insights. @lhs8928 @jijoongmoon Let's think if training optimizations can be done in a similar fashion.