google / neural-tangents

Fast and Easy Infinite Neural Networks in Python
https://iclr.cc/virtual_2020/poster_SklD9yrFPS.html
Apache License 2.0
2.28k stars 226 forks source link

using stax.Cos(a=1.0, b=1.0, c=0.0) to get kernel from conv layer gives error #192

Closed bayesfourrier closed 12 months ago

bayesfourrier commented 1 year ago

Hi, I am trying to generate conv kernel using the infinite network example given in the documentation . https://neural-tangents.readthedocs.io/en/latest/stax.html

I am trying to add a activation stax.Cos(a=1.0, b=1.0, c=0.0) but it gives error:

ValueError: The input to the activation function must be Gaussian, i.e. a random affine transform is required before the activation function.

I do not understand how to add random affine transform. What to do? Thank you

romanngg commented 1 year ago

Do you have a code snippet? Usually this error arises if you stack two nonlinearities one after another (e.g. Cos after ReLU), in which case the exact analytic infinite width limit is not known; but when inputs to nonlinearities are iid Gaussians (i.e. after Conv, Dense, ConvLocal, ConvTranspose etc layers with iid random weights), then the limiting covariance matrix of nonlinearity outputs can be computed.

romanngg commented 1 year ago

If there's no flatten layer, the covariance of activations, which have shapes (n1, h, w, c) and (n2, h, w, c), will tend to a dense 6D matrix of shape (n1, n2, h, h, w, w), as c is taken to infinity. This is why the covariance is 6D. When a flattened layer is used, the network output covariance has shape (n1, n2) (2D), and to compute it, only the diagonal entries of intermediary layer covariances are needed, hence it only computes a 4D (n1, n2, h, w) matrix in intermediary layers (and not full 6D covariances).

how i can reduce it to 3 or 4 dimensions

You can extract the diagonal(s) form the 6D matrix, or have your network terminate with a flattening layer. It depends on why exactly you want the shapes to be 3 or 4D.