Closed grmaier closed 3 years ago
SingleTaskGP
in the canned version that we have in BoTorch assumes an unknown homoskedastic noise level that we infer together with the other model parameters.Thanks very much for your detailed answer concerning my first question!
Regarding 2.:
I was refering to what is said here https://botorch.org/docs/models under "Single-Task GPs".
I guess what I don't understand is when to use SingleTaskGP
and when FixedNoiseGP
. If I know that my data is corrupted with some homoskedatic (known) noise \espilon, I would use FixedNoiseGP
as I can only make corrupted observations y = f(x) +\epsilon. If I know that I can make exact observations, i.e. measure f(x) exactly, then I would use SingleTaskGP
but without a prior homoskedatic noise level. So I don't understand why the provided version of SingleTaskGP
seems to be a hybrid between these to approaches.
Ah I see. So
SingleTaskGP
as it is checked in always estimates a homoskedastic noise level. You usually use that in the case where you either know that you have noise (but don't know what the noise is), or you don't know whether you have noise or not.FixedNoiseGP
requires noise observations. If you know the noise and don't need to infer it you typically get a better model. So for the noiseless case, where you do know that there is no observation noise, we typically still use a very small noise level for numerical stability reasons. To achieve this you have two options (well you have more if you want to write your own model, but let's focus with the canned models):
SingleTaskGP
but either choose a prior that forces the noise level to very small values, or just remove the noise as a parameter to optimize over and initialize it to a small value. This is kind of a hack. FixedNoiseGP
but use a very small value for the variance (e.g. 10^-6
).Got it! Thanks for your quick and detailed response!
@Balandat I have one further question: Can you explain to me when to use the model HeteroskedasticSingleTaskGP
?
If I know the noise which is corrupting my data, I would use the model FixedNoiseGP
.
In HeteroskedasticSingleTaskGP
the noise is modeled by another SingleTaskGP
model. In the documentation (https://botorch.org/api/_modules/botorch/models/gp_regression.html#HeteroskedasticSingleTaskGP) it is said that "this allows the likelihood to make out-of-sample predictions for the observation noise levels." I am not entirely sure what is meant by that. I guess that HeteroskedasticSingleTaskGP
should be used when I know that my observations are corrupted by heteroskedastic noise, but the noise is unknown? But on the other hand, HeteroskedasticSingleTaskGP
requires an additional argument train_Yvar
of observed measurement noise, so, as far as I understand, the noise has in fact to be known?
HeteroskedasticSingleTaskGP
employs another GP model to model the noise. As you observed, it relies on noise observations (train_Yvar
) to fit this model. There are two benefits to this:
If you don't need either of these, it's perfectly fie to use FixedNoiseGP
. In fact, it's preferable, since the model has fewer parameters to estimate which simplifies inference.
I am trying to understand how the
SingleTaskGP
model works. If I read the doc and the code correctly, the assumptions on the priors of the kernel inSingleTaskGP
are as follows: Letf
be the black-box function, which is to be optimized, andx
be a point in the domain. Thenf|x
has normal distribution with mean 0 and variancek(x,x)
, wherek
is the Matern(5/2) kernel, i.e.k(x,x') = \theta_0^2 \exp(-\sqrt{5}r) (1 + \sqrt{5}r + \frac{5}{3}r^2)
withr = \frac{|x - x'|}{\theta_1}
, and with output scale parameter\theta_0^2
with distribution\Gamma(2.0, 0.15)
and length scale parameter\theta_1
with distribution\Gamma(3.0, 6.0)
. Moreover, we assume a homoskedastic noise level to obtain observationsy = f(x) + \epsilon
with\epsilon
being normally distributed with mean 0 and variance\sigma^2
with distribution\Gamma(1.1, 0.05)
.This leads me to two questions: (1) Why do we assume that the above hyperparameters are gamma distributed and what explains the choice of the parameters in the gamma distributions? (2) In the documentation it's said that
SingleTaskGP
assumes noiseless observations. I guess that means that we measuref(x)
directly instead of the corrupted observationy
. But what is the reason then why we assume a homoskedastic noise level\epsilon
in the first place? Wouldn't it make more sense to assume no noise level at all in this case? Maybe I misunderstand how the homoskedastic noise comes into play.