Open marcoct opened 8 years ago
What do you expect to see for a linear kernel with effectively no noise? I'm not sure there's anything we can do about this short of just crashing.
I'm not clear on what's wrong with the SE kernel -- can you elaborate on why that graph looks bad?
These are supposed to be equivalent to drawing joint samples from the GP. This particular method of drawing the joint sample proceeds by successively sampling each new datapoint, conditioned on the other others. The points are generated left-to-right sequentially.
The bottom graph is bad because the the magnitude of oscillations changes suddenly (at around zero), and I believe after around zero the datapoints are no longer being sampled accurately. I believe the remainder of the curve is an unstable oscillation induced by loss of numerical precision.
I'm not particularly surprised by either of these. It appears that when sampling successive points, the covariance over the next point approaches zero and eventually reaches a point where the samples suffer from numerical problems. Use of zero noise is a degenerate case, which I'm not surprised generates numerical problems for sufficiently large number of datapoints.
However, if the numerical problem is clarified, it can be documented. Perhaps a warning on the make_gp method which mentions situations that result in numerical problems.
I think crossing over zero is a red herring -- I see the same effect even with, e.g. linspace(-1.1, -0.9, 100) and a length scale of 0.001.
One obvious place to find the culprit would be the mvnormal.conditional
, in backend/lite/mvnormal.py
. If the condition number of Sigma22
were large, there would be a problem. However, it's not clear to me why the condition number would be large, or why that doesn't cause a problem for sampling from the prior.
I tested with numpy's estimate of the condition number of Sigma22
, which shows that it grows quickly to >10^17, which heuristically means all digits of precision are lost.
If I disable the LU decomposition method for covariance solutions, the SE instability seems to go away.
That is -- if Cholesky decomposition fails, we fall back to least-squares solutions instead of trying LU decomposition first.
Seems to fix the linear instability too.
What I'm not clear on is why LU decomposition is so bad here -- maybe a licensed and certified numerical linear algebraist can tell me. Did I misuse it, or is it simply that there are some cases like this where it will do poorly, and other cases where it will do better than least-squares solutions? If the latter, can we detect those cases?
And how do we test for this bug? I guess one easy way would be to test the hypothesis that if we draw 100 points sequentially, the 100th point should be normally distributed with the right mean and variance.
It would be nice to check whether the procedure of drawing points sequentially and drawing points in a batch yield the same distribution. But in a necessarily multivariate space like this maybe that's too hard to contemplate.
The last coordinate's marginal should be pretty good here.
Another kind of test would be Geweke style:
This works for x and y vector; the mutlivariate distributions could be compared by comparing individual coordinates, or, say, products of pairs of coordinates.
Just to clarify the nature of this problem:
gp([-100.0, -99.9, -99.8, -99.7, ...])
-- produced and still produces a much flatter graph than Marco shared.[gp(-100.0), gp(-99.9), gp(-99.8), gp(-99.7), ...]
-- was what produced the wild excursions, shown in Marco's graphs, for both SE and linear kernels, until I disabled the LU decomposition path.However, we don't have an automatic test to distinguish these two graphs, which is why the issue remains open.
Demonstrated for the SE kernel and linear kernel (for sufficiently low noise values).
Self-contained python block to reproduce this:
Result:
Result for SE kernel showing instability: