kasparmartens / NeuralProcesses

Neural Processes implementation for 1D regression
65 stars 11 forks source link

Relaxing constant sigma assumption #7

Open kevinykuo opened 5 years ago

kevinykuo commented 5 years ago

If we wanted to make this bit https://github.com/kasparmartens/NeuralProcesses/blob/5119ac0e6f0fec0dff6fa37977ab60139212464c/NP_architecture2.R#L68-L71 more general, what would be the correct way to do it? Would we try to estimate it from the n_draws draws of each of the y* predictions?

kasparmartens commented 5 years ago

You could make noise_sd a parameter and try to learn it (for numerical stability, you probably want to lower- and upper-bound it).

kevinykuo commented 5 years ago

Thanks for the reply! Do you mean e.g. outputting another quantity connected to hidden?

https://github.com/kasparmartens/NeuralProcesses/blob/5119ac0e6f0fec0dff6fa37977ab60139212464c/NP_architecture2.R#L59-L66

That seems straightforward but I wasn't sure how to justify it since the decoder looked like it should only predict the target y's and we needed to obtain the variance elsewhere. But I guess we would be taking the samples of z in the input so that randomness is accounted for.

kasparmartens commented 5 years ago

Depends what kind of noise model you want to assume. The most natural one would probably be the one which assumes constant noise. E.g. in the GP-regression model, typical choice for p(y|f, x) would be Normal distribution with mean f(x) and variance \sigma^2, i.e. the latter would not depend on input x. In this case, \sigma^2 would be a single variable (not parameterised by a network).

If we are interested in scenarios where noise level varies with x, then we could indeed consider parameterising \sigma^2 along the lines as you described.