kasparmartens / NeuralProcesses

Neural Processes implementation for 1D regression
65 stars 11 forks source link

Having trouble replicating your results #3

Open chrisorm opened 6 years ago

chrisorm commented 6 years ago

Hi Kaspar,

I tried replicating the results in your post in PyTorch, and I'm unable to get even close to the kind of results you display on your blog. I am sure there is an error on my end somewhere, but I have poured over the paper and your code and your blog post and I'm unable to see anything that could be behind it. I had a friend look over my implementation too and were unable to spot anything of substance different.

I have tried as best as possible to follow the architecture and setup of your experiment 1, and I see very different behavior. The code is very simple, It's afterall only a handful of simple nns,

https://github.com/chrisorm/Machine-Learning/blob/ngp/Neural%20GP.ipynb

Some things I witness that you don't seem to see (shown in the notebook): -My q distribution concentrates (i.e. std goes to 0).

The first led me to suspect an error in my KLD term, but that does not seem to be the case - I unit tested my implementation and I think it is correct. The loss looks good and the network clearly converges.

The second is a bit stranger - do you perhaps use some particular initialization of the weights to draw these samples, over and above setting z ~ N(0,1)?

Would you happen to have any insights as to what may be behind this difference?

Thanks for taking the time to do your post, it has some really great insights into the method!

Chris

kasparmartens commented 6 years ago

Hi Chris,

Sorry for my slow response.

You say that you see very different behaviour from what I saw, but I don't think you can conclude much from the example where we are learning a single fixed function.

I agree that initialisation can sometimes have an effect, and that in this particular case, out-of-sample uncertainty can disappear when training for a large number of iterations (afterall, we are learning a single function on a grid, and I think it is non-obvious what sort of behaviour to expect from the model outside this set of points).

The fact that you have experienced q collapse, seems plausible given the model formulation, but I didn't experience this myself in my experiments.

Kaspar

kasparmartens commented 6 years ago

Sorry, I accidentally clicked "close"