Closed jamesrobertlloyd closed 11 years ago
Thanks. That definitely looks wrong. Do you think you could send me the inputs to laplace_approx, though? In particular I don't have an easy way to get the hessian.
Roger
On Mon, Jan 21, 2013 at 10:43 AM, jamesrobertlloyd <notifications@github.com
wrote:
This kernel
ScoredKernel(k_opt=ProductKernel([ MaskKernel(ndim=8, active_dimension=0, base_kernel=CubicKernel(offset=1.757755, output_variance=7.084045)), MaskKernel(ndim=8, active_dimension=7, base_kernel=SqExpPeriodicKernel(lengthscale=-2.701080, period=-0.380918, output_variance=-0.071214)) ]), nll=6348.096611, laplace_nle=-184450132.068237, bic_nle=12720.630212, noise=[-1.77276072])
applied to data set r_concrete_500_fold_10_of_10.mat
has a clearly erroneous laplace_nle
Why?
— Reply to this email directly or view it on GitHubhttps://github.com/jamesrobertlloyd/gp-structure-search/issues/3.
Just ran a script in sandpit.debug_laplace to recreate the error - pickled the output into source/laplace_error.p Load it with (sorry - should have saved as tuple rather than a list!)
import pickle vars = pickle.load(open('laplace_error.p')) sk = vars[0] # The suspicious Scored kernel script = vars[1][0] # GPML script to compute Hessian output = vars[2] # The output of the script result = vars[3] # The scored kernel resulting from the output - note different (but still crazy) Laplace approx during to having started optimiser from different places - not important, it's still a crazy output
output.hessian will be a good place to start - not positive definite - perhaps it is catching out the proj_psd command?
Good luck and let me know if you need anything else to work on this
I experimented with the length of the finite difference approx used to calculate the Hessian - still results in Hessians that lead to unusual laplace approxes
Thanks. I notice that not only is the hessian not PSD, it also has an entry of 3.18e12 on the diagonal. This will almost certainly cause numerical problems and make it give crazy answers. Is there a legitimate reason for it to be of order 10^12, or is that a bug? If it's legitimate, we'll have to think harder about how to compute the Laplace approximation in those cases (and it might affect some of the other algorithms as well). Otherwise, we'll want to track down the bug, and in the meantime place an assert somewhere so that it gives an error rather than returning a super duper likelihood score.
On Thu, Jan 24, 2013 at 9:38 AM, jamesrobertlloyd notifications@github.comwrote:
Just ran a script in sandpit.debug_laplace to recreate the error - pickled the output into source/laplace_error.p Load it with (sorry - should have saved as tuple rather than a list!)
import pickle vars = pickle.load(open('laplace_error.p')) sk = vars[0] # The suspicious Scored kernel script = vars[1][0] # GPML script to compute Hessian output = vars[2] # The output of the script result = vars[3] # The scored kernel resulting from the output - note different (but still crazy) Laplace approx during to having started optimiser from different places - not important, it's still a crazy output
output.hessian will be a good place to start - not positive definite - perhaps it is catching out the proj_psd command?
Good luck and let me know if you need anything else to work on this
I experimented with the length of the finite difference approx used to calculate the Hessian - still results in Hessians that lead to unusual laplace approxes
— Reply to this email directly or view it on GitHubhttps://github.com/jamesrobertlloyd/gp-structure-search/issues/3#issuecomment-12653716.
Let's experiment to see if the Hessians from MATLAB seem legit. Varying the step size of the finite difference calculation, we get the matrices below.
I cannot discern any obvious trends. The fact that signs seem to flip every so often suggests that we are testing the numerical accuracy of the derivative formulae in GPML. Any other thoughts/observations?
In my next comment there will be some exploratory plots...
1e-1
+8.6522e+00 +5.7413e+00 -1.5832e+01 +1.6244e+07 +5.6365e+00
+5.7413e+00 +4.1806e+00 -1.0545e+01 +1.1531e+07 +4.0605e+00
-1.5832e+01 -1.0545e+01 +1.9165e+01 -3.4706e+07 -1.0539e+01
+1.6244e+07 +1.1531e+07 -3.4706e+07 +7.5742e+04 +1.1526e+07
+5.6365e+00 +4.0605e+00 -1.0539e+01 +1.1526e+07 +3.9404e+00
1e-2
+7.3674e+00 +6.0717e+00 -1.4569e+01 +1.8389e+07 +5.6940e+00
+6.0717e+00 +5.8234e+00 -9.6200e+00 +1.2195e+07 +5.2384e+00
-1.4569e+01 -9.6200e+00 +1.2758e+01 -2.8177e+07 -9.6601e+00
+1.8389e+07 +1.2195e+07 -2.8177e+07 +8.9304e+05 +1.2214e+07
+5.6940e+00 +5.2384e+00 -9.6601e+00 +1.2214e+07 +4.6534e+00
1e-3
-1.0704e+01 -3.3397e+00 -1.1056e+01 +1.8474e+07 +2.1915e+01
-3.3397e+00 +1.9268e+01 -1.0500e+01 +1.2258e+07 +1.0551e+01
-1.1056e+01 -1.0500e+01 +1.3073e+01 -2.7650e+07 -1.0820e+01
+1.8474e+07 +1.2258e+07 -2.7650e+07 +1.0071e+07 +1.2332e+07
+2.1915e+01 +1.0551e+01 -1.0820e+01 +1.2332e+07 +1.8336e+00
1e-4
-1.2709e+02 +1.3610e+02 -1.3953e+01 +2.0614e+07 +4.7130e+01
+1.3610e+02 -3.0915e+01 -1.1274e+01 +1.3371e+07 +7.0436e+01
-1.3953e+01 -1.1274e+01 +1.6923e+01 -2.7130e+07 -1.2816e+01
+2.0614e+07 +1.3371e+07 -2.7130e+07 +4.7308e+08 +1.4162e+07
+4.7130e+01 +7.0436e+01 -1.2816e+01 +1.4162e+07 +1.7179e+02
1e-5
-1.2485e+03 -7.2327e+02 -1.0194e+02 +6.5094e+06 -5.3225e+01
-7.2327e+02 +1.3733e+03 -6.2107e+01 +1.9421e+07 +1.4022e+03
-1.0194e+02 -6.2107e+01 +1.7011e+01 -1.9966e+07 -9.0486e+01
+6.5094e+06 +1.9421e+07 -1.9966e+07 +4.0034e+10 +1.8997e+07
-5.3225e+01 +1.4022e+03 -9.0486e+01 +1.8997e+07 +1.4311e+03
1e-6
-9.1229e+03 -1.0733e+04 +1.5740e+03 -1.8915e+08 +5.8898e+02
-1.0733e+04 +2.3216e+04 +5.1945e+02 +1.0537e+08 +1.4845e+04
+1.5740e+03 +5.1945e+02 +5.6887e+02 -1.0466e+07 +4.7553e+02
-1.8915e+08 +1.0537e+08 -1.0466e+07 +3.2147e+12 -4.5512e+07
+5.8898e+02 +1.4845e+04 +4.7553e+02 -4.5512e+07 +6.4742e+03
1e-7
-1.5973e+05 +6.4187e+04 -1.0822e+04 +5.8456e+08 -8.4113e+03
+6.4187e+04 +3.0694e+04 -1.2721e+04 +1.6981e+09 +6.3806e+04
-1.0822e+04 -1.2721e+04 +5.3632e+03 +9.4772e+08 -1.1863e+04
+5.8456e+08 +1.6981e+09 +9.4772e+08 +5.6459e+13 -8.0183e+08
-8.4113e+03 +6.3806e+04 -1.1863e+04 -8.0183e+08 +9.6918e+04
1e-8
-2.3741e+05 +1.5215e+05 +1.0901e+05 +1.3436e+10 -4.3918e+05
+1.5215e+05 +1.2361e+06 +6.4673e+04 -1.4165e+10 +1.2320e+06
+1.0901e+05 +6.4673e+04 -2.2584e+03 +7.1079e+09 +6.1322e+04
+1.3436e+10 -1.4165e+10 +7.1079e+09 +1.2526e+14 -2.4051e+09
-4.3918e+05 +1.2320e+06 +6.1322e+04 -2.4051e+09 +1.2279e+06
1e-9
-3.4606e+07 -3.2188e+06 +1.7158e+06 +8.0593e+09 -6.0905e+06
-3.2188e+06 +6.0398e+06 +9.8569e+05 -1.2262e+11 +2.3448e+06
+1.7158e+06 +9.8569e+05 +5.9892e+05 -4.9899e+10 +1.0781e+06
+8.0593e+09 -1.2262e+11 -4.9899e+10 +1.3932e+14 +7.3460e+10
-6.0905e+06 +2.3448e+06 +1.0781e+06 +7.3460e+10 -1.3503e+06
1e-10
-2.9407e+08 +3.9351e+07 +2.4516e+07 -1.1775e+12 +2.5139e+07
+3.9351e+07 -1.8118e+07 +5.7634e+06 +7.2534e+11 -4.3006e+07
+2.4516e+07 +5.7634e+06 +1.0959e+06 +1.6164e+12 +5.0872e+06
-1.1775e+12 +7.2534e+11 +1.6164e+12 +1.3761e+14 -1.7572e+10
+2.5139e+07 -4.3006e+07 +5.0872e+06 -1.7572e+10 -6.7894e+07
1e-11
-3.7236e+09 -4.3814e+08 +1.6247e+08 -5.1157e+12 -1.2950e+09
-4.3814e+08 +5.2616e+07 +5.0687e+07 -8.2348e+12 +6.0310e+08
+1.6247e+08 +5.0687e+07 -4.4617e+06 +9.4263e+12 +5.4727e+07
-5.1157e+12 -8.2348e+12 +9.4263e+12 +1.3760e+14 +1.4653e+13
-1.2950e+09 +6.0310e+08 +5.4727e+07 +1.4653e+13 +1.1536e+09
1e-12
-1.2043e+10 +9.0341e+09 +8.2261e+08 -1.2115e+14 +1.1460e+10
+9.0341e+09 +6.0535e+09 -1.6399e+08 +1.6743e+13 -9.4721e+08
+8.2261e+08 -1.6399e+08 +2.2586e+08 +5.7523e+13 -1.9715e+08
-1.2115e+14 +1.6743e+13 +5.7523e+13 +4.1769e+14 -9.1880e+12
+1.1460e+10 -9.4721e+08 -1.9715e+08 -9.1880e+12 -7.9479e+09
Continuing the above analysis - let's plot the nll surface and see what was going on. All of the plots below vary one variable from [-scale, scale]
e.g. the super high curvature parameter, 4, at a scale of 1e-2
I will end this comment here to test the inclusion of images..
OK - images working - and we have a clear minimum at this scale - zooming in to a scale of 1e-6 we get
Definitely a minimum but with negative curvature away from the minimum explaining why the estimates of curvature were getting larger as we decreased the step size
However, things are not so pretty for other variables - e.g. parameter 1 at a scale of 1e-6
and the corresponding derivative
i.e. we are not at a minimum - gradient is roughly constant + noise. Zooming out - the nll surface looks like
Definitely not a minimum!
So the optimiser had failed and we should have no reason to believe that the laplace approx should give anything other than garbage!
In conclusion my few plots suggest that the very large curvatures are not unreasonable but the optimiser has failed. When the optimiser fails the noise on the calculations of derivatives can become significant (e.g. look at the curvature of dimension 1 as delta is made smaller - nearly monotonically increasing in magnitude).
Latest version of optimiser in new release of GPML + fingers crossed? Or something more intelligent?
These plots are great. I talked with Roger the other day and he seemed to think that the bug was caused by very large values on the diagonals causing numerical problems in his python code.
The example you give seems to make a pretty compelling case that we won't solve this until we end up at a true optimum, although it might be that even there some dimensions are very jagged.
I mentioned that the optimizer seemed to be stopping at saddle points to Carl, and he asked me to send him code that could reproduce the problem, so that he could make his optimizer robust against it. That shouldn't be too hard...
Reproducing code will be easy enough - just need to extract the script produced by sandpit.debug_laplace and tidy up some paths - I'm going out now but can produce this tomorrow.
Reproducing code in laplace_debug - just changed a parameter of the optimiser to check we were not forcing premature stopping (we are not) - shall we catch Carl this afternoon?
Conclusion: If the optimiser fails - then the Laplace approx is bogus - safest thing to do is to return a marginal likelihood approx of NaN. Still need to make sure that the optimiser is as good as possible - but we need to be robust to failure.
The laplace function now returns nans when hessian is not PSD
As I mentioned before, unless there's a bug I don't know about, the existing code should handle non-PSD matrices just fine. The problem is ill-conditioned matrices, regardless of whether they have negative eigenvalues. We could fix this by writing a more numerically stable version, and the only reason I didn't was that ill-conditioned Hessians seem like a sign of a bug somewhere upstream. But I'll go ahead and write the stable version later today.
Hmmm - correct me if I'm wrong but I think the conclusion was that the non-PSD matrices we had tracked down were symptoms of a failed optimiser. In this case, there is no meaningful 'best approximation' to the Laplace integral because the likelihood surface cannot be approximated as a locally Gaussian optimum - because it is not an optimum.
I think the safest solution is to indicate failure of the Laplace method in general. Upstream, we ultimately should be using the best optimisation techniques / diagnostics we can and catch an optimisation failure before asking for the meaningless Laplace approx to marginal likelihood.
But perhaps the hessian is often not psd - but very close? Then returning a nan whenever not psd would be rather severe! Is this what you and David observed when first testing the Laplace approx?
I think it's dangerous to have it quietly return nans. We might wind up throwing out good kernels without knowing it, because of the issues you describe. I'm writing the new version of the Laplace approximation so that it has a second return value which says whether there was a problem. That way you can continue the experiment as if nothing happened, but check afterwards to see if these problems affected the results.
In general, it seems possible for the Hessian to have eigenvalues of zero -- this would happen, for instance, if there's a redundant parameterization. This would probably lead to their being slightly negative eigenvalues which we should simply treat as zero.
Roger
Good point - course of action seems good!
Btw, in the fix you committed, laplace_approx just crashes for me because comparing arrays using != doesn't give you a bool. Also, since floating point computations aren't exact, we should be using np.allclose to compare, rather than exact equality.
I've pushed the stable Laplace approximation (laplace_approx_stable in psd_matrices.py) along with this fix.
I think I might have accidentally overwritten some of the results file when I pushed the last commit, so watch out for that. I'll try to debug what happened later.
Yep - good spot - your fix is much better than mine!
Forget what I said earlier about overwriting results files. I just misread the output of git commit, and thought I was overwriting stuff when I was just merging changes. Based on the log, everything seems to be right.
Next step for this is to fix the optimiser - will send an example to Carl and see if it is fixed by latest version of GPML
Closing this issue and merging into a generic fix the optimiser issue
This kernel
ScoredKernel(k_opt=ProductKernel([ MaskKernel(ndim=8, active_dimension=0, base_kernel=CubicKernel(offset=1.757755, output_variance=7.084045)), MaskKernel(ndim=8, active_dimension=7, base_kernel=SqExpPeriodicKernel(lengthscale=-2.701080, period=-0.380918, output_variance=-0.071214)) ]), nll=6348.096611, laplace_nle=-184450132.068237, bic_nle=12720.630212, noise=[-1.77276072])
applied to data set r_concrete_500_fold_10_of_10.mat
has a clearly erroneous laplace_nle
Why?