Closed bwnjnOEI closed 2 years ago
Linearized training refers indeed to updating the parameters of the linearized model (i.e., first-order Taylor decomposition). This is in general not equivalent to training the full neural network with gradient flow, which would also update higher order terms in the decomposition.
Hi, nice work! You expanded and delved the work Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel. But some problem puzzle me:
What is
the linearized training
? This seem like different from training NNs using tiny learning rate(gradient flow). And i guess it just update parameters by SGD( or using tiny learning rate)?The question I have is not directly from your paper, but I want to solve my trouble.