Sequential simulation of datapoints from GP without noise is unstable

marcoct commented 8 years ago

Demonstrated for the SE kernel and linear kernel (for sufficiently low noise values).

Self-contained python block to reproduce this:

import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np

import venture.shortcuts as vs
from venture.ripl.utils import strip_types
import venture.value.dicts as v
import venture.lite.value as vv
from venture.lite import types as t
from venture.lite.sp_help import typed_nr, deterministic_typed

def make_name(sym, index):
    return sym + "_" + str(int(index))

ripl = vs.make_lite_ripl()
ripl.bind_foreign_inference_sp("make_symbol", deterministic_typed(make_name, 
    [t.SymbolType(), t.NumberType()], t.SymbolType()))

ripl.execute_program("""
assume mean = gp_mean_const(0.0);
//assume kernel = gp_cov_se(1.0);
assume kernel = gp_cov_sum(gp_cov_linear(0.0), gp_cov_scale(0.0000000000000001, gp_cov_delta(0.0001)));
assume gp = make_gp(mean, kernel);
define xvals = linspace(-10.0, 10.0, 100);
define obs_symbols = proc(t) { make_symbol("y", t) };
define obs_values = run(mapM(
    proc(t) {
        assume(
            unquote(obs_symbols(t)),
            unquote(quasiquote(gp(array(unquote(lookup(xvals, t))))))
        )
    },
    arange(size(xvals))));
""")

xvals = np.linspace(-10.0, 10.0, 100)
yvals = ripl.infer("return(obs_values)")
plt.plot(xvals, yvals, marker=".")
plt.savefig("gp_bug.pdf")

Result: screen shot 2016-04-01 at 5 33 05 pm

Result for SE kernel showing instability: screen shot 2016-04-01 at 5 33 49 pm

riastradh-probcomp commented 8 years ago

What do you expect to see for a linear kernel with effectively no noise? I'm not sure there's anything we can do about this short of just crashing.

I'm not clear on what's wrong with the SE kernel -- can you elaborate on why that graph looks bad?

marcoct commented 8 years ago

These are supposed to be equivalent to drawing joint samples from the GP. This particular method of drawing the joint sample proceeds by successively sampling each new datapoint, conditioned on the other others. The points are generated left-to-right sequentially.

The bottom graph is bad because the the magnitude of oscillations changes suddenly (at around zero), and I believe after around zero the datapoints are no longer being sampled accurately. I believe the remainder of the curve is an unstable oscillation induced by loss of numerical precision.

I'm not particularly surprised by either of these. It appears that when sampling successive points, the covariance over the next point approaches zero and eventually reaches a point where the samples suffer from numerical problems. Use of zero noise is a degenerate case, which I'm not surprised generates numerical problems for sufficiently large number of datapoints.

marcoct commented 8 years ago

However, if the numerical problem is clarified, it can be documented. Perhaps a warning on the make_gp method which mentions situations that result in numerical problems.

riastradh-probcomp commented 8 years ago

I think crossing over zero is a red herring -- I see the same effect even with, e.g. linspace(-1.1, -0.9, 100) and a length scale of 0.001.

One obvious place to find the culprit would be the mvnormal.conditional, in backend/lite/mvnormal.py. If the condition number of Sigma22 were large, there would be a problem. However, it's not clear to me why the condition number would be large, or why that doesn't cause a problem for sampling from the prior.

I tested with numpy's estimate of the condition number of Sigma22, which shows that it grows quickly to >10^17, which heuristically means all digits of precision are lost.

riastradh-probcomp commented 8 years ago

If I disable the LU decomposition method for covariance solutions, the SE instability seems to go away.

riastradh-probcomp commented 8 years ago

That is -- if Cholesky decomposition fails, we fall back to least-squares solutions instead of trying LU decomposition first.

riastradh-probcomp commented 8 years ago

Seems to fix the linear instability too.

What I'm not clear on is why LU decomposition is so bad here -- maybe a licensed and certified numerical linear algebraist can tell me. Did I misuse it, or is it simply that there are some cases like this where it will do poorly, and other cases where it will do better than least-squares solutions? If the latter, can we detect those cases?

And how do we test for this bug? I guess one easy way would be to test the hypothesis that if we draw 100 points sequentially, the 100th point should be normally distributed with the right mean and variance.

riastradh-probcomp commented 8 years ago

It would be nice to check whether the procedure of drawing points sequentially and drawing points in a batch yield the same distribution. But in a necessarily multivariate space like this maybe that's too hard to contemplate.

axch commented 8 years ago

The last coordinate's marginal should be pretty good here.

Another kind of test would be Geweke style:

Draw x_1 from the marginal
Draw y_1 conditioned on x_1
Draw x_2 conditioned on y_1
Draw y_2 conditioned on x_2
ad nausam The distribution on x_n and y_n should be the same as drawing them both from the prior.

This works for x and y vector; the mutlivariate distributions could be compared by comparing individual coordinates, or, say, products of pairs of coordinates.

riastradh-probcomp commented 8 years ago

Just to clarify the nature of this problem:

Sampling the points simultaneously from a GP -- gp([-100.0, -99.9, -99.8, -99.7, ...]) -- produced and still produces a much flatter graph than Marco shared.
Sampling the points sequentially -- [gp(-100.0), gp(-99.9), gp(-99.8), gp(-99.7), ...] -- was what produced the wild excursions, shown in Marco's graphs, for both SE and linear kernels, until I disabled the LU decomposition path.
Sampling the points sequentially now, with the LU decomposition path disabled, should give the same flatter graph as sampling points simultaneously, with perhaps a modicum of accumulated error beyond sampling points simultaneously, but small enough that it is unlikely to be visible to the naked eye.

However, we don't have an automatic test to distinguish these two graphs, which is why the issue remains open.

probcomp / Venturecxx

Sequential simulation of datapoints from GP without noise is unstable #490