The different output dimensions in examples

cycyyy commented 1 year ago

Hi, I'm new to neural tangents. I don't quite understand the output dimensions in the examples.

As my understanding, the kernel size calculated by kernel_fn should be N*N, and the example for WRN validates my guess. However, the examples in here show that the kernel dimension is N*N*O*O whereas O is the output/classes number (10 for MNIST and 1000 for ImageNet).

I believe both of these two examples are trying to solve the classification problems, but the kernels they generated have different dimensions. May I ask what's the difference between these two examples? And where can I find more docs/papers to help me understand these?

Thank you so much.

romanngg commented 1 year ago

The full kernel shape is indeed N*N*O*O; but the WRN example is the infinite width kernel; in this limit, the kernel becomes constant-block diagonal, and has a structure of $K \otimes I_{O \times O}$, where $K$ is of size $N \times N$, and is the matrix computed in the WRN example. To not waste space, only this non-trivial tiled block of size $N \times N$ is returned.

In the other example, the finite width kernel is returned; this kernel is not constant-block diagonal, and is a dense N*N*O*O matrix, so the entire matrix is returned. There are however parameters that can change this to be only the diagonal of the mean-trace of this matrix along some axes, see docs at https://neural-tangents.readthedocs.io/en/latest/_autosummary/neural_tangents.empirical_ntk_fn.html#neural_tangents.empirical_ntk_fn.

Sometimes, even for finite-width, you may want to approximate the N*N*O*O matrix with $K \otimes I_{O \times O}$ to save time/memory; this is well described in https://arxiv.org/pdf/2206.12543.pdf by @mohamad-amin

Hope this helps!

Edit: https://github.com/google/neural-tangents/discussions/161 is related

cycyyy commented 1 year ago

Thank you so much for your quick reply!!

google / neural-tangents

The different output dimensions in examples #170