Closed cycyyy closed 1 year ago
The full kernel shape is indeed N*N*O*O
; but the WRN example is the infinite width kernel; in this limit, the kernel becomes constant-block diagonal, and has a structure of $K \otimes I_{O \times O}$, where $K$ is of size $N \times N$, and is the matrix computed in the WRN example. To not waste space, only this non-trivial tiled block of size $N \times N$ is returned.
In the other example, the finite width kernel is returned; this kernel is not constant-block diagonal, and is a dense N*N*O*O
matrix, so the entire matrix is returned. There are however parameters that can change this to be only the diagonal of the mean-trace of this matrix along some axes, see docs at https://neural-tangents.readthedocs.io/en/latest/_autosummary/neural_tangents.empirical_ntk_fn.html#neural_tangents.empirical_ntk_fn.
Sometimes, even for finite-width, you may want to approximate the N*N*O*O
matrix with $K \otimes I_{O \times O}$ to save time/memory; this is well described in https://arxiv.org/pdf/2206.12543.pdf by @mohamad-amin
Hope this helps!
Edit: https://github.com/google/neural-tangents/discussions/161 is related
Thank you so much for your quick reply!!
Hi, I'm new to neural tangents. I don't quite understand the output dimensions in the examples.
As my understanding, the kernel size calculated by
kernel_fn
should beN*N
, and the example for WRN validates my guess. However, the examples in here show that the kernel dimension isN*N*O*O
whereasO
is the output/classes number (10 for MNIST and 1000 for ImageNet).I believe both of these two examples are trying to solve the classification problems, but the kernels they generated have different dimensions. May I ask what's the difference between these two examples? And where can I find more docs/papers to help me understand these?
Thank you so much.