Singular Covariance Matrix crash

tBuLi commented 6 years ago

If the covariance matrix happens to be a singular matrix, numpy throws LinAlgError: Singular matrix, which in turn crashes symfit.

I do not currently have time to write a minimal example of how to reproduce this, but it shouldn't take more then a try/except to fix since this is a thing that can just happen depending on the data.

A more nuanced fix though, would involve checking if the minimizer used returns a hessian_inv, and use that instead of computing it anew as we currently do in symfit 0.4.0. The reason for computing it anew is to stay consistent across minimizers, as well as to not have to deal with the inconsistent API of the scipy solvers. But maybe it is worth doing that for a maximum chance of success.

pckroon commented 6 years ago

Is the hessian_inv exactly identical to the covariance matrix? Or is there still some scaling of sorts involved? If the former, I'll have a look.

tBuLi commented 6 years ago

There is a scaling involved, depending on whether the weights are to be interpreted as measurements errors or not.

If we interpret as measurement errors then we need to rescale with the total sum of residuals as also remarked here: https://stats.stackexchange.com/questions/71154/when-an-analytical-jacobian-is-available-is-it-better-to-approximate-the-hessia (Sorry can not find a better source right now)

But this can be achieved out of the box by copying this if/else statement.

A test should definitely be added though to see if this is indeed in agreement.

As an additional comment, it might be worth also finding a way to dump all the raw returns from the minimizer into FitResults, not just the ones I cherry picked now. Since scipy minimize returns essentially a dict, perhaps we can pop all the values that we want into the FitResults, and store the remainder in FitResults.minimizer_return or something like that.

tBuLi commented 6 years ago

p.s. it also worth rereading my final answer in #138 since it seems there was an issue with this. Maybe it is not as trivial as it seems, or I forgot precisely this rescaling when testing for #138. I don't have time to go through all the sources right now though.

pckroon commented 6 years ago

That SO post is good :) It might be worthwhile to separate the case where we have an analytical Jacobian from the one where we don't. To summarize:

1) We have the analytical Jacobian In this case it's best to either derive the Hessian analytically, or calculate it via finite differences (error O(h)).

2) We do not have the analytical Jacobian ~Here we should use the (inversed) Hessian given by the minimizer (if we have it), and otherwise it's probably best to compute J^T*J and pray the residuals are small enough. If that's not good enough (for you), it should be possible to calculate a Hessian via finite differences on the original function. That might be expensive though.~ It should be rather trivial to compute the second derivative via finite differences. This be the best option, even if we do have the analytical Jacobian, since the error here should be O(h^2). The bookkeeping is going to be annoying though.

tBuLi commented 6 years ago

This is already how it is done currently: J = model.numerical_jacobian is used to approximate the Hessian by computing J.dot(J.T). And model.numerical_jacobian is either analytical or finite difference, depending on the model.

But apparently this can result in a singular matrix, but in those cases we should check if perhaps the minimizer already returned the inverse hessian, so we can use that instead. I don't know if that is a successful strategy; maybe that is also not present since there is a reason this error occurs in the first place.

In reference to your #147 I think it might still be a good approach to compute the exact Hessian instead of approximating it with J.T @ J, but I have some very specific technical comments that I will leave there. However it might be worth testing if indeed the minimizer still returns the inverse hessian even if J.T @ J is singular. Unfortunately the problem occurred for me in a very obfuscated yet to be published piece of research I was doing so that doesn't make for a good minimal example ;). To be continued.

pckroon commented 6 years ago

https://gking.harvard.edu/files/help.pdf How about taking pseudo inverses? The rest of the article is over my head though.

tBuLi commented 6 years ago

Looks like an interesting read ;). I havent heard of this before, I'll check it out!

pckroon commented 6 years ago

So the quick test of changing np.linalg.inv(cov_matrix_inv) to np.linalg.pinv(cov_matrix_inv) does not cause any failing tests...

tBuLi commented 6 years ago

So far I have found out that the generalised inverse of a matrix exists, and it's generalised inverse is the same as the normal inverse. So in that sense there is no harm in making this substitution.

However, there could be harm if the statistical interpretation of the two is not the same. So we would have to read about that in more detail. Other than that, it is very good to know, particularly because it always exists.

pckroon commented 6 years ago

That question I'll leave to you ;) It would be good to have a testcase

tBuLi commented 6 years ago

In PR #146 you symply catch the singular matrix error and return no cov. matrix in that case. I think that should be the accepted solution since I don't think the pseudo inverse has quite the same interpretation and I don't want to be responsible for future retractions ;).

So closing this issue.

tBuLi / symfit

Singular Covariance Matrix crash #144