Open kevinykuo opened 5 years ago
You could make noise_sd
a parameter and try to learn it (for numerical stability, you probably want to lower- and upper-bound it).
Thanks for the reply! Do you mean e.g. outputting another quantity connected to hidden
?
That seems straightforward but I wasn't sure how to justify it since the decoder looked like it should only predict the target y's and we needed to obtain the variance elsewhere. But I guess we would be taking the samples of z in the input so that randomness is accounted for.
Depends what kind of noise model you want to assume. The most natural one would probably be the one which assumes constant noise. E.g. in the GP-regression model, typical choice for p(y|f, x) would be Normal distribution with mean f(x) and variance \sigma^2, i.e. the latter would not depend on input x. In this case, \sigma^2 would be a single variable (not parameterised by a network).
If we are interested in scenarios where noise level varies with x, then we could indeed consider parameterising \sigma^2 along the lines as you described.
If we wanted to make this bit https://github.com/kasparmartens/NeuralProcesses/blob/5119ac0e6f0fec0dff6fa37977ab60139212464c/NP_architecture2.R#L68-L71 more general, what would be the correct way to do it? Would we try to estimate it from the n_draws draws of each of the y* predictions?