This line of work starts from the Local Binary Convolution Neural Networks (LBCNN) by the same author, which proposes a non-learnable spatial conv layer as an alternative to standard conv layer.
This paper further explores whether spatial conv layers is needed. It turns out that simply adding noise + learnable channel pooling (PNN) is feasible.
Motivation: One can perhaps completely remove conv layers for image classification based on observations that make CNN still works:
Very small receptive fields (3x3 conv filter)
Sparse and/or binary conv weights.
How PNN works: Given an input, the perturbation layer first perturbs the input additively through a set of random but fixed noise. Feed these perturbed inputs to a non-linear activation functions. Learn a weighted combination of these non-linear activations then produce a final feature map for classification.
Theoretical analysis on relating PNN and CNN:
A macro view: we have shown that PNN layer can be a good approximation for any CNN layer.
A micro view: we have shown that convolution operation behaves like additive noise under mild assumptions.
On-par performance with standard CNNs (ImageNet, CIFAR-10, MNIST, and Pascal VOC).
My comment: This line of work may help improving robustness to adversarial example and preventing model from overfitting (since we use non-learnable feature extractor).
According to here, LBCNN shows that random feature extraction via random convolution together with learnable channel pooling in deep neural networks are able to learn effective image features, and we view the additive random noise in PNN as one simplest way of such random feature extraction.
Metadata