google / neural-tangents

Fast and Easy Infinite Neural Networks in Python
https://iclr.cc/virtual_2020/poster_SklD9yrFPS.html
Apache License 2.0
2.28k stars 226 forks source link

The analytical output of GP can not fit the result of NNGP generated by the nt.predict.gp_inference #178

Open Gengfu-He opened 1 year ago

Gengfu-He commented 1 year ago

kernel_train = kernel_fn(train_xs, train_xs, 'nngp')

kernel_cov= kernel_fn(train_xs, test_xs, 'nngp')

Kff_inv = np.linalg.inv(kernel_train + noise_scale noise_scale np.mean(np.trace(kernel_train)) * np.eye(len(train_xs)))

mean_predict.analytical result = kernel_cov.T.dot(Kff_inv).dot(train_ys)

I have recently found that the analytical results above can not agree well with the predictions of nt.predict.gp_inference as follows:

predict_fn = nt.predict.gp_inference(kernel_train, train_ys, diag_reg=noise_scale*noise_scale) k_test_test = kernel_fn(test_xs, None, 'nngp') mean_predict.NNGP , covariance = predict_fn('nngp', kernel_cov.T, k_test_test)

What is the problem? I am not sure if the trace_axes part has some influence?

romanngg commented 1 year ago

Mathematically I think we're doing what you've wrote, but we implement it with Cholesky factorization, so instead of

mean_predict.analytical result = kernel_cov.T.dot(Kff_inv).dot(train_ys)

we do something like

import jax.scipy as sp

c, _ = sp.linalg.cho_factor(kernel_train + noise_scale * noise_scale * np.mean(np.trace(kernel_train)) * np.eye(len(train_xs)))
Kff_inv_dot_train_ys = sp.linalg.cho_solve(c, train_ys)
mean_predict.analytical result = kernel_cov.T.dot(Kff_inv_dot_train_ys)

This could give slightly different results from np.linalg.inv, but is faster. Could this explain the difference, or you get huge discrepancy?