Question about linearized NNs

gortizji / linearized-networks

Source code of "What can linearized neural networks actually say about generalization?

MIT License

18 stars 1 forks source link

Hi, nice work! You expanded and delved the work Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel. But some problem puzzle me:

What is the linearized training? This seem like different from training NNs using tiny learning rate(gradient flow). And i guess it just update parameters $\theta \quad \text{ in } f^{lin}_{\theta}(x) = f_{\theta_0}(x) + \nabla f_{\theta_0}(x)(\theta - \theta_0)$ by SGD( or using tiny learning rate)?

The question I have is not directly from your paper, but I want to solve my trouble.

gortizji / linearized-networks

Question about linearized NNs #1