Thanks for your kind reply. It seems that the only difference between them is the covariance matrix. Does it mean that if I set compute_cov=False in predict_fn returned by gradient_descent_mse_ensemble, gradient_descent_mse_ensemble will be same as gradient_descent_mse? If so, which API has better performance, i.e. shorter computing time?
I noticed other attributes, fx_train_0 and fx_test_0 in gradient_descent_mse, representing the output of the network at t = 0 on training and test data, respectively. Based on my understanding, in the linearized neural networks, to get a precise approximation on the original network, tangent kernel and output at initialization, fx_train_0 and fx_test_0, are required.
However, in the infinite width limit, the tangent kernel converges to the deterministic kernel, thus there is no need to provide those values at initialization. I notice with an infinite width limit, you set the default value of both training and test data to 0., as following,
Why is this case? Moreover, the outputs(mean) of gradient_descent_mse_ensemble are very close to gradient_descent_mse with fx_train_0=0. and fx_test_0=0.. Does it imply if the infinite width limit case, we should better set those values to 0, while in the finite width network, we should provide the values at initialization?
Thanks for your kind reply. It seems that the only difference between them is the covariance matrix. Does it mean that if I set
compute_cov=False
inpredict_fn
returned bygradient_descent_mse_ensemble
,gradient_descent_mse_ensemble
will be same asgradient_descent_mse
? If so, which API has better performance, i.e. shorter computing time?I noticed other attributes,
fx_train_0
andfx_test_0
ingradient_descent_mse
, representing the output of the network at t = 0 on training and test data, respectively. Based on my understanding, in the linearized neural networks, to get a precise approximation on the original network, tangent kernel and output at initialization,fx_train_0
andfx_test_0
, are required.However, in the infinite width limit, the tangent kernel converges to the deterministic kernel, thus there is no need to provide those values at initialization. I notice with an infinite width limit, you set the default value of both training and test data to 0., as following,
These setting also appear in another work Disentangling Trainability and Generalization in Deep Neural Networks. However, if I set them with the value at initialization, it will be,
and
Why is this case? Moreover, the outputs(mean) of
gradient_descent_mse_ensemble
are very close togradient_descent_mse
withfx_train_0=0.
andfx_test_0=0.
. Does it imply if the infinite width limit case, we should better set those values to 0, while in the finite width network, we should provide the values at initialization?_Originally posted by @lionelmessi6410 in https://github.com/google/neural-tangents/issue_comments/706645520_