junaidmalik09 / fastonn

FastONN - Python based open-source GPU implementation for Operational Neural Networks
GNU General Public License v3.0
22 stars 9 forks source link

Order of powers of x make depthwise separable convolution behave poorly #8

Open danyk98 opened 2 years ago

danyk98 commented 2 years ago

Since x = cat([(x ** i) for i in range(1, self.q + 1)], dim=1) stacks the newly created powers of x along the channels dimension, it makes depthwise seperable convolution behave poorly since the filter passes over a mixture of channels and powers. If we look at the case where C_in=2 and q=3, the channel axis looks like: x1, x2, x12, x22, x13, x23.

With groups=2 in depthwise convolution, the first filter would pass over x1, x2, x12, and the second filter would pass over x22, x13, x23

I fixed this issue by re-arrranging the output of this line using fancy indexing: permutation = [(i * self.in_channels) % (self.in_channels * self.q) + i // self.q for i in range(self.in_channels * self.q)] x = cat([(x ** i) for i in range(1, self.q + 1)], dim=1)[:, permutation]

The channel axis now looks like x1, x12, x13, x2, x22, x23, so the filters pass over the powers of x1 and x2 individually.

https://github.com/junaidmalik09/fastonn/blob/9591a31f3960927e1e715d10df46b54b385cd94d/fastonn/SelfONN.py#L276