google / neural-tangents

Fast and Easy Infinite Neural Networks in Python
https://iclr.cc/virtual_2020/poster_SklD9yrFPS.html
Apache License 2.0
2.28k stars 226 forks source link

Questions about wide networks #131

Open JinraeKim opened 2 years ago

JinraeKim commented 2 years ago

Hi, team! I'm interested in it but it's hard to get it. I have some questions about wide networks. 1) What's the prediction of wide networks, e.g., NNGP? Evaluated mean of GP? Is it deterministic? If so, what's different from a standard GP? 2) How is a wide network trained? For example, in NNGP, it seems that the network is trained very similar to a standard GP with matrix inversion. Is it right? 3) Is the training and inference mechanism the same for both finite and infinite-width networks in this package?

SiuMath commented 2 years ago

Hi Jinrae, Glad that you are interested in NT and we are more than happy to help. There are a couple tutorials on github that could be very useful; https://github.com/google/neural-tangents/tree/main/notebooks.

  1. For NNGP, it is basically kernel regression/ bayesian inference. We need to pass either an infinite width NNGP kernel, which is deterministic; or a finite-width empirical NNGP kernel, which is stochastic, like random feature models.

  2. Yes. NNGP is trained, more precisely "doing inference", using matrix inversion.

  3. There are a couple "training approaches".

    • finite width SGD training; which is the same as neural networks.
    • NTK/NNGP related "inference", in which we use Bayesian inference/ matrix inversion. Finite/initinit-width networks are almost the same, the major difference is how the the kernel is computed.

This paper: https://arxiv.org/abs/1902.06720 may be helpful to clarify some concepts related to finite/infinite-width networks. Roughly, as the width approaches infinity, the (SGD) training dynamics of the finite width network will converge to something similar to kernel-regression/ bayesian inference.

Let us know if you have any other questions.

On Tue, Nov 30, 2021 at 10:34 AM Jinrae Kim @.***> wrote:

Hi, team! I'm interested in it but it's hard to get it. I have some questions about wide networks.

  1. What's the prediction of wide networks, e.g., NNGP? Is it deterministic?
  2. How is a wide network trained? For example, in NNGP, it seems that the network is trained very similar to a standard GP with matrix inversion. Is it right?
  3. Is the training and inference mechanism the same for both finite and infinite-width networks in this package?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/neural-tangents/issues/131, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGC3MAY6K5WO4QWLLXH6SIDUOTVILANCNFSM5JCDY73A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

JinraeKim commented 2 years ago

Hi Jinrae, Glad that you are interested in NT and we are more than happy to help. There are a couple tutorials on github that could be very useful; https://github.com/google/neural-tangents/tree/main/notebooks.

  1. For NNGP, it is basically kernel regression/ bayesian inference. We need to pass either an infinite width NNGP kernel, which is deterministic; or a finite-width empirical NNGP kernel, which is stochastic, like random feature models.

  2. Yes. NNGP is trained, more precisely "doing inference", using matrix inversion.

  3. There are a couple "training approaches".

    • finite width SGD training; which is the same as neural networks.
    • NTK/NNGP related "inference", in which we use Bayesian inference/ matrix inversion. Finite/initinit-width networks are almost the same, the major difference is how the the kernel is computed.

This paper: https://arxiv.org/abs/1902.06720 may be helpful to clarify some concepts related to finite/infinite-width networks. Roughly, as the width approaches infinity, the (SGD) training dynamics of the finite width network will converge to something similar to kernel-regression/ bayesian inference.

Let us know if you have any other questions.

On Tue, Nov 30, 2021 at 10:34 AM Jinrae Kim @.***> wrote:

Hi, team! I'm interested in it but it's hard to get it. I have some questions about wide networks.

  1. What's the prediction of wide networks, e.g., NNGP? Is it deterministic?
  2. How is a wide network trained? For example, in NNGP, it seems that the network is trained very similar to a standard GP with matrix inversion. Is it right?
  3. Is the training and inference mechanism the same for both finite and infinite-width networks in this package?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/neural-tangents/issues/131, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGC3MAY6K5WO4QWLLXH6SIDUOTVILANCNFSM5JCDY73A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Thank you so much for your detailed answer. If you don't mind, please answer the following questions.

Sorry for my poor questions due to the lack of background of GP and NTK.

EDIT: I read the tutorial notebooks. For infinitely wide NNGP, only network architectures have meaning for Bayesian inference. In this regard, I don't understand why the tutorial usually constructs a finitely wide NN (e.g., 512 width) even for kernel calculation. Also, I'm not sure which one is preferred between the ensemble of finite NNs with randomised parameters and a simple NNGP.

SiuMath commented 2 years ago

Hey Jinrae,

We are indeed very excited and eager to see NeuralTangents being used beyond the GP/NTK community. If you have further questions, please let us know !

Best,

On Tue, Nov 30, 2021 at 6:57 PM Jinrae Kim @.***> wrote:

Hi Jinrae, Glad that you are interested in NT and we are more than happy to help. There are a couple tutorials on github that could be very useful; https://github.com/google/neural-tangents/tree/main/notebooks.

1.

For NNGP, it is basically kernel regression/ bayesian inference. We need to pass either an infinite width NNGP kernel, which is deterministic; or a finite-width empirical NNGP kernel, which is stochastic, like random feature models. 2.

Yes. NNGP is trained, more precisely "doing inference", using matrix inversion. 3.

There are a couple "training approaches".

  • finite width SGD training; which is the same as neural networks.
    • NTK/NNGP related "inference", in which we use Bayesian inference/ matrix inversion. Finite/initinit-width networks are almost the same, the major difference is how the the kernel is computed.

This paper: https://arxiv.org/abs/1902.06720 may be helpful to clarify some concepts related to finite/infinite-width networks. Roughly, as the width approaches infinity, the (SGD) training dynamics of the finite width network will converge to something similar to kernel-regression/ bayesian inference.

Let us know if you have any other questions.

On Tue, Nov 30, 2021 at 10:34 AM Jinrae Kim @.***> wrote:

Hi, team! I'm interested in it but it's hard to get it. I have some questions about wide networks.

  1. What's the prediction of wide networks, e.g., NNGP? Is it deterministic?
  2. How is a wide network trained? For example, in NNGP, it seems that the network is trained very similar to a standard GP with matrix inversion. Is it right?
  3. Is the training and inference mechanism the same for both finite and infinite-width networks in this package?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub

131 https://github.com/google/neural-tangents/issues/131, or

unsubscribe

https://github.com/notifications/unsubscribe-auth/AGC3MAY6K5WO4QWLLXH6SIDUOTVILANCNFSM5JCDY73A . Triage notifications on the go with GitHub Mobile for iOS

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

Thank you so much for your detailed answer. If you don't mind, please answer the following questions.

  • So,let's begin with infinitely wide NNGP. As I understood, from the fact that infinitely wide NN gives us a GP, and bayesian inference is performed by computing kernel recursively and matrix inversion to obtain mean and covariance at given test points, right?
  • In finitely wide NNGP, recursion of kernel seems not deterministic. Is that why you pointed out the difference between deterministic and stochastic kernel calculation of infinite and finite-wide NNGP?
  • How does finite-width NNGP have the same training procedure? I thought that it has a stochastic network parameter, while an ordinary NN has a deterministic network parameter. I supposed that all we need to infer is kernel, not the network parameter. Is it right?

Sorry for my poor questions due to the lack of background of GP and NTK.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/neural-tangents/issues/131#issuecomment-983134851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGC3MA45O4ONHZUEQGFWY23UOVQHFANCNFSM5JCDY73A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.