jamesrobertlloyd / gp-structure-search

MIT License
225 stars 83 forks source link

Laplace bug #3

Closed jamesrobertlloyd closed 11 years ago

jamesrobertlloyd commented 11 years ago

This kernel

ScoredKernel(k_opt=ProductKernel([ MaskKernel(ndim=8, active_dimension=0, base_kernel=CubicKernel(offset=1.757755, output_variance=7.084045)), MaskKernel(ndim=8, active_dimension=7, base_kernel=SqExpPeriodicKernel(lengthscale=-2.701080, period=-0.380918, output_variance=-0.071214)) ]), nll=6348.096611, laplace_nle=-184450132.068237, bic_nle=12720.630212, noise=[-1.77276072])

applied to data set r_concrete_500_fold_10_of_10.mat

has a clearly erroneous laplace_nle

Why?

rgrosse commented 11 years ago

Thanks. That definitely looks wrong. Do you think you could send me the inputs to laplace_approx, though? In particular I don't have an easy way to get the hessian.

Roger

On Mon, Jan 21, 2013 at 10:43 AM, jamesrobertlloyd <notifications@github.com

wrote:

This kernel

ScoredKernel(k_opt=ProductKernel([ MaskKernel(ndim=8, active_dimension=0, base_kernel=CubicKernel(offset=1.757755, output_variance=7.084045)), MaskKernel(ndim=8, active_dimension=7, base_kernel=SqExpPeriodicKernel(lengthscale=-2.701080, period=-0.380918, output_variance=-0.071214)) ]), nll=6348.096611, laplace_nle=-184450132.068237, bic_nle=12720.630212, noise=[-1.77276072])

applied to data set r_concrete_500_fold_10_of_10.mat

has a clearly erroneous laplace_nle

Why?

— Reply to this email directly or view it on GitHubhttps://github.com/jamesrobertlloyd/gp-structure-search/issues/3.

jamesrobertlloyd commented 11 years ago

Just ran a script in sandpit.debug_laplace to recreate the error - pickled the output into source/laplace_error.p Load it with (sorry - should have saved as tuple rather than a list!)

import pickle vars = pickle.load(open('laplace_error.p')) sk = vars[0] # The suspicious Scored kernel script = vars[1][0] # GPML script to compute Hessian output = vars[2] # The output of the script result = vars[3] # The scored kernel resulting from the output - note different (but still crazy) Laplace approx during to having started optimiser from different places - not important, it's still a crazy output

output.hessian will be a good place to start - not positive definite - perhaps it is catching out the proj_psd command?

Good luck and let me know if you need anything else to work on this

I experimented with the length of the finite difference approx used to calculate the Hessian - still results in Hessians that lead to unusual laplace approxes

rgrosse commented 11 years ago

Thanks. I notice that not only is the hessian not PSD, it also has an entry of 3.18e12 on the diagonal. This will almost certainly cause numerical problems and make it give crazy answers. Is there a legitimate reason for it to be of order 10^12, or is that a bug? If it's legitimate, we'll have to think harder about how to compute the Laplace approximation in those cases (and it might affect some of the other algorithms as well). Otherwise, we'll want to track down the bug, and in the meantime place an assert somewhere so that it gives an error rather than returning a super duper likelihood score.

On Thu, Jan 24, 2013 at 9:38 AM, jamesrobertlloyd notifications@github.comwrote:

Just ran a script in sandpit.debug_laplace to recreate the error - pickled the output into source/laplace_error.p Load it with (sorry - should have saved as tuple rather than a list!)

import pickle vars = pickle.load(open('laplace_error.p')) sk = vars[0] # The suspicious Scored kernel script = vars[1][0] # GPML script to compute Hessian output = vars[2] # The output of the script result = vars[3] # The scored kernel resulting from the output - note different (but still crazy) Laplace approx during to having started optimiser from different places - not important, it's still a crazy output

output.hessian will be a good place to start - not positive definite - perhaps it is catching out the proj_psd command?

Good luck and let me know if you need anything else to work on this

I experimented with the length of the finite difference approx used to calculate the Hessian - still results in Hessians that lead to unusual laplace approxes

— Reply to this email directly or view it on GitHubhttps://github.com/jamesrobertlloyd/gp-structure-search/issues/3#issuecomment-12653716.

jamesrobertlloyd commented 11 years ago

Let's experiment to see if the Hessians from MATLAB seem legit. Varying the step size of the finite difference calculation, we get the matrices below.

I cannot discern any obvious trends. The fact that signs seem to flip every so often suggests that we are testing the numerical accuracy of the derivative formulae in GPML. Any other thoughts/observations?

In my next comment there will be some exploratory plots...

1e-1

+8.6522e+00 +5.7413e+00 -1.5832e+01 +1.6244e+07 +5.6365e+00
+5.7413e+00 +4.1806e+00 -1.0545e+01 +1.1531e+07 +4.0605e+00
-1.5832e+01 -1.0545e+01 +1.9165e+01 -3.4706e+07 -1.0539e+01
+1.6244e+07 +1.1531e+07 -3.4706e+07 +7.5742e+04 +1.1526e+07
+5.6365e+00 +4.0605e+00 -1.0539e+01 +1.1526e+07 +3.9404e+00

1e-2

+7.3674e+00 +6.0717e+00 -1.4569e+01 +1.8389e+07 +5.6940e+00
+6.0717e+00 +5.8234e+00 -9.6200e+00 +1.2195e+07 +5.2384e+00
-1.4569e+01 -9.6200e+00 +1.2758e+01 -2.8177e+07 -9.6601e+00
+1.8389e+07 +1.2195e+07 -2.8177e+07 +8.9304e+05 +1.2214e+07
+5.6940e+00 +5.2384e+00 -9.6601e+00 +1.2214e+07 +4.6534e+00

1e-3

-1.0704e+01 -3.3397e+00 -1.1056e+01 +1.8474e+07 +2.1915e+01
-3.3397e+00 +1.9268e+01 -1.0500e+01 +1.2258e+07 +1.0551e+01
-1.1056e+01 -1.0500e+01 +1.3073e+01 -2.7650e+07 -1.0820e+01
+1.8474e+07 +1.2258e+07 -2.7650e+07 +1.0071e+07 +1.2332e+07
+2.1915e+01 +1.0551e+01 -1.0820e+01 +1.2332e+07 +1.8336e+00

1e-4

-1.2709e+02 +1.3610e+02 -1.3953e+01 +2.0614e+07 +4.7130e+01
+1.3610e+02 -3.0915e+01 -1.1274e+01 +1.3371e+07 +7.0436e+01
-1.3953e+01 -1.1274e+01 +1.6923e+01 -2.7130e+07 -1.2816e+01
+2.0614e+07 +1.3371e+07 -2.7130e+07 +4.7308e+08 +1.4162e+07
+4.7130e+01 +7.0436e+01 -1.2816e+01 +1.4162e+07 +1.7179e+02

1e-5

-1.2485e+03 -7.2327e+02 -1.0194e+02 +6.5094e+06 -5.3225e+01
-7.2327e+02 +1.3733e+03 -6.2107e+01 +1.9421e+07 +1.4022e+03
-1.0194e+02 -6.2107e+01 +1.7011e+01 -1.9966e+07 -9.0486e+01
+6.5094e+06 +1.9421e+07 -1.9966e+07 +4.0034e+10 +1.8997e+07
-5.3225e+01 +1.4022e+03 -9.0486e+01 +1.8997e+07 +1.4311e+03

1e-6

-9.1229e+03 -1.0733e+04 +1.5740e+03 -1.8915e+08 +5.8898e+02
-1.0733e+04 +2.3216e+04 +5.1945e+02 +1.0537e+08 +1.4845e+04
+1.5740e+03 +5.1945e+02 +5.6887e+02 -1.0466e+07 +4.7553e+02
-1.8915e+08 +1.0537e+08 -1.0466e+07 +3.2147e+12 -4.5512e+07
+5.8898e+02 +1.4845e+04 +4.7553e+02 -4.5512e+07 +6.4742e+03

1e-7

-1.5973e+05 +6.4187e+04 -1.0822e+04 +5.8456e+08 -8.4113e+03
+6.4187e+04 +3.0694e+04 -1.2721e+04 +1.6981e+09 +6.3806e+04
-1.0822e+04 -1.2721e+04 +5.3632e+03 +9.4772e+08 -1.1863e+04
+5.8456e+08 +1.6981e+09 +9.4772e+08 +5.6459e+13 -8.0183e+08
-8.4113e+03 +6.3806e+04 -1.1863e+04 -8.0183e+08 +9.6918e+04

1e-8

-2.3741e+05 +1.5215e+05 +1.0901e+05 +1.3436e+10 -4.3918e+05
+1.5215e+05 +1.2361e+06 +6.4673e+04 -1.4165e+10 +1.2320e+06
+1.0901e+05 +6.4673e+04 -2.2584e+03 +7.1079e+09 +6.1322e+04
+1.3436e+10 -1.4165e+10 +7.1079e+09 +1.2526e+14 -2.4051e+09
-4.3918e+05 +1.2320e+06 +6.1322e+04 -2.4051e+09 +1.2279e+06

1e-9

-3.4606e+07 -3.2188e+06 +1.7158e+06 +8.0593e+09 -6.0905e+06
-3.2188e+06 +6.0398e+06 +9.8569e+05 -1.2262e+11 +2.3448e+06
+1.7158e+06 +9.8569e+05 +5.9892e+05 -4.9899e+10 +1.0781e+06
+8.0593e+09 -1.2262e+11 -4.9899e+10 +1.3932e+14 +7.3460e+10
-6.0905e+06 +2.3448e+06 +1.0781e+06 +7.3460e+10 -1.3503e+06

1e-10

-2.9407e+08 +3.9351e+07 +2.4516e+07 -1.1775e+12 +2.5139e+07
+3.9351e+07 -1.8118e+07 +5.7634e+06 +7.2534e+11 -4.3006e+07
+2.4516e+07 +5.7634e+06 +1.0959e+06 +1.6164e+12 +5.0872e+06
-1.1775e+12 +7.2534e+11 +1.6164e+12 +1.3761e+14 -1.7572e+10
+2.5139e+07 -4.3006e+07 +5.0872e+06 -1.7572e+10 -6.7894e+07

1e-11

-3.7236e+09 -4.3814e+08 +1.6247e+08 -5.1157e+12 -1.2950e+09
-4.3814e+08 +5.2616e+07 +5.0687e+07 -8.2348e+12 +6.0310e+08
+1.6247e+08 +5.0687e+07 -4.4617e+06 +9.4263e+12 +5.4727e+07
-5.1157e+12 -8.2348e+12 +9.4263e+12 +1.3760e+14 +1.4653e+13
-1.2950e+09 +6.0310e+08 +5.4727e+07 +1.4653e+13 +1.1536e+09

1e-12

-1.2043e+10 +9.0341e+09 +8.2261e+08 -1.2115e+14 +1.1460e+10
+9.0341e+09 +6.0535e+09 -1.6399e+08 +1.6743e+13 -9.4721e+08
+8.2261e+08 -1.6399e+08 +2.2586e+08 +5.7523e+13 -1.9715e+08
-1.2115e+14 +1.6743e+13 +5.7523e+13 +4.1769e+14 -9.1880e+12
+1.1460e+10 -9.4721e+08 -1.9715e+08 -9.1880e+12 -7.9479e+09

jamesrobertlloyd commented 11 years ago

Continuing the above analysis - let's plot the nll surface and see what was going on. All of the plots below vary one variable from [-scale, scale]

e.g. the super high curvature parameter, 4, at a scale of 1e-2

temp

I will end this comment here to test the inclusion of images..

jamesrobertlloyd commented 11 years ago

OK - images working - and we have a clear minimum at this scale - zooming in to a scale of 1e-6 we get temp

Definitely a minimum but with negative curvature away from the minimum explaining why the estimates of curvature were getting larger as we decreased the step size

However, things are not so pretty for other variables - e.g. parameter 1 at a scale of 1e-6 temp and the corresponding derivative temp

i.e. we are not at a minimum - gradient is roughly constant + noise. Zooming out - the nll surface looks like

temp

Definitely not a minimum!

So the optimiser had failed and we should have no reason to believe that the laplace approx should give anything other than garbage!

jamesrobertlloyd commented 11 years ago

In conclusion my few plots suggest that the very large curvatures are not unreasonable but the optimiser has failed. When the optimiser fails the noise on the calculations of derivatives can become significant (e.g. look at the curvature of dimension 1 as delta is made smaller - nearly monotonically increasing in magnitude).

Latest version of optimiser in new release of GPML + fingers crossed? Or something more intelligent?

duvenaud commented 11 years ago

These plots are great. I talked with Roger the other day and he seemed to think that the bug was caused by very large values on the diagonals causing numerical problems in his python code.

The example you give seems to make a pretty compelling case that we won't solve this until we end up at a true optimum, although it might be that even there some dimensions are very jagged.

I mentioned that the optimizer seemed to be stopping at saddle points to Carl, and he asked me to send him code that could reproduce the problem, so that he could make his optimizer robust against it. That shouldn't be too hard...

jamesrobertlloyd commented 11 years ago

Reproducing code will be easy enough - just need to extract the script produced by sandpit.debug_laplace and tidy up some paths - I'm going out now but can produce this tomorrow.

jamesrobertlloyd commented 11 years ago

Reproducing code in laplace_debug - just changed a parameter of the optimiser to check we were not forcing premature stopping (we are not) - shall we catch Carl this afternoon?

jamesrobertlloyd commented 11 years ago

Conclusion: If the optimiser fails - then the Laplace approx is bogus - safest thing to do is to return a marginal likelihood approx of NaN. Still need to make sure that the optimiser is as good as possible - but we need to be robust to failure.

jamesrobertlloyd commented 11 years ago

The laplace function now returns nans when hessian is not PSD

rgrosse commented 11 years ago

As I mentioned before, unless there's a bug I don't know about, the existing code should handle non-PSD matrices just fine. The problem is ill-conditioned matrices, regardless of whether they have negative eigenvalues. We could fix this by writing a more numerically stable version, and the only reason I didn't was that ill-conditioned Hessians seem like a sign of a bug somewhere upstream. But I'll go ahead and write the stable version later today.

jamesrobertlloyd commented 11 years ago

Hmmm - correct me if I'm wrong but I think the conclusion was that the non-PSD matrices we had tracked down were symptoms of a failed optimiser. In this case, there is no meaningful 'best approximation' to the Laplace integral because the likelihood surface cannot be approximated as a locally Gaussian optimum - because it is not an optimum.

I think the safest solution is to indicate failure of the Laplace method in general. Upstream, we ultimately should be using the best optimisation techniques / diagnostics we can and catch an optimisation failure before asking for the meaningless Laplace approx to marginal likelihood.

jamesrobertlloyd commented 11 years ago

But perhaps the hessian is often not psd - but very close? Then returning a nan whenever not psd would be rather severe! Is this what you and David observed when first testing the Laplace approx?

rgrosse commented 11 years ago

I think it's dangerous to have it quietly return nans. We might wind up throwing out good kernels without knowing it, because of the issues you describe. I'm writing the new version of the Laplace approximation so that it has a second return value which says whether there was a problem. That way you can continue the experiment as if nothing happened, but check afterwards to see if these problems affected the results.

In general, it seems possible for the Hessian to have eigenvalues of zero -- this would happen, for instance, if there's a redundant parameterization. This would probably lead to their being slightly negative eigenvalues which we should simply treat as zero.

Roger

jamesrobertlloyd commented 11 years ago

Good point - course of action seems good!

rgrosse commented 11 years ago

Btw, in the fix you committed, laplace_approx just crashes for me because comparing arrays using != doesn't give you a bool. Also, since floating point computations aren't exact, we should be using np.allclose to compare, rather than exact equality.

I've pushed the stable Laplace approximation (laplace_approx_stable in psd_matrices.py) along with this fix.

I think I might have accidentally overwritten some of the results file when I pushed the last commit, so watch out for that. I'll try to debug what happened later.

jamesrobertlloyd commented 11 years ago

Yep - good spot - your fix is much better than mine!

rgrosse commented 11 years ago

Forget what I said earlier about overwriting results files. I just misread the output of git commit, and thought I was overwriting stuff when I was just merging changes. Based on the log, everything seems to be right.

jamesrobertlloyd commented 11 years ago

Next step for this is to fix the optimiser - will send an example to Carl and see if it is fixed by latest version of GPML

jamesrobertlloyd commented 11 years ago

Closing this issue and merging into a generic fix the optimiser issue