Some questions regarding the paper

idstcv / GPU-Efficient-Networks

Apache License 2.0

191 stars 37 forks source link

Do you mind explaining more about the conclusions you get from the experimental results in the paragraph above?

I don't quite understand why BL-block is slower than XX-block when r=1. From what I understand when r=1, we have

so BL-block has a computational cost of dxdx1x1 + dxdx3x3 + dxdx1x1, while XX-block has a computational cost of dxdx3x3+dxdx3x3, so BL-block should still be faster? What am I misunderstanding here?

What is the "approximation loss" mentioned at the end of the paragraph above? Based on the profiling results, I can't really see how the it "show that the approximation loss is in proportion to the intrinsic rank of the original XX-block". There is no rank nor "approximation loss" in the figures.

Hi @yxchng , Thank you for the good questions!

For Q1, ONE BL-Block consists of TWO kxk layers. That is, 1x1 + kxk + 1x1 + 1x1 + kxk + 1x1. Please kindly check the source code to get more details. This design is to ensure that XX and BL share the same perception region.

For Q2, "approximation loss" refers to the loss of low-rank approximation when you run SVD on convolutional kernels of XX blocks and then discard the small singular values. If XX blocks are of low-rank before SVD, the low-rank approximation loss should be zero, meaning that the network is not changed at all after low-rank approximation. Therefore when you replace a low-rank XX block with BL block, there should be no information loss. In practice, since XX block is not perfectly low-rank, replacing XX with BL (or DW) will always induce information loss, degrade the network power.

idstcv / GPU-Efficient-Networks

Some questions regarding the paper #14