Per-iteration speed of PAU compared with other activation functions

Hey, we are glad that you want to try it! Let us know if you need any help installing it and getting it to run. We are excited to see how it works for you!

To your questions: The per-iteration time of our CUDA implementation is not much larger compared to other activation function. It depends on your Hardware. In our experiments on GTX 1080 TI and V100 we observed around 1%-5% overhead, ymmv.

Currently we are sharing weights (one PAU per layer) to keep the parameter space small for now. But we are also very interested in a non-weight-shared version. The time should not differ much for the forward pass as we already compute PAU per neuron, the backward pass will be slower as you have many more parameters.

Our claim regarding polynomial approximations is based on [1]. They show that when your activation functions are polynomials, you don't get a universal approximator. Pau however, allows the network to be a universal approximator. However, we didn't try out other approximators.

[1] M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861–867,

(http://www2.math.technion.ac.il/~pinkus/papers/neural.pdf)

ml-research / pau

Per-iteration speed of PAU compared with other activation functions #1