strawlab / strand-braid

Live, low-latency 2D and 3D tracking from single or multiple high-speed cameras
https://strawlab.org/braid/
Other
46 stars 8 forks source link

Error: CovarianceNotPositiveSemiDefinite #3

Closed florisvb closed 3 years ago

florisvb commented 3 years ago

We occasionally get the following error during our experiments (Braid 0.9.0):

Error {kind: CovarianceNotPositiveSemiDefinite }' flydra2/src/tracking_core.rs:785:34

It often seems to follow ERROR: flydra2: mean reprojection 100x distance value (big number like 1453620) out of expected range

It's possible that with fine tuning of parameters this could be mitigated, but it would be nice for it to be robust. I think these big errors may be due to slowly walking flies that blend into the background for some, but not all, cameras, resulting in some weird tracking.

I would submit a patch, but I don't know rust and my brain is not currently prepared to learn it. However, I can offer the following python code I found that returns the nearest positive definite matrix. Perhaps a simple conversion to rust would do the trick.

import numpy as np
from numpy import linalg as la

def nearestPD(A):
    """Find the nearest positive-definite matrix to input
    A Python/Numpy port of John D'Errico's `nearestSPD` MATLAB code [1], which
    credits [2].
    [1] https://www.mathworks.com/matlabcentral/fileexchange/42885-nearestspd
    [2] N.J. Higham, "Computing a nearest symmetric positive semidefinite
    matrix" (1988): https://doi.org/10.1016/0024-3795(88)90223-6

    nan and inf checks added.
    """
    A[np.isnan(A)] = 0
    A[np.isinf(A)] = 0

    B = (A + A.T) / 2
    _, s, V = la.svd(B)

    H = np.dot(V.T, np.dot(np.diag(s), V))

    A2 = (B + H) / 2

    A3 = (A2 + A2.T) / 2

    if isPD(A3):
        return A3

    spacing = np.spacing(la.norm(A))
    # The above is different from [1]. It appears that MATLAB's `chol` Cholesky
    # decomposition will accept matrixes with exactly 0-eigenvalue, whereas
    # Numpy's will not. So where [1] uses `eps(mineig)` (where `eps` is Matlab
    # for `np.spacing`), we use the above definition. CAVEAT: our `spacing`
    # will be much larger than [1]'s `eps(mineig)`, since `mineig` is usually on
    # the order of 1e-16, and `eps(1e-16)` is on the order of 1e-34, whereas
    # `spacing` will, for Gaussian random matrixes of small dimension, be on
    # othe order of 1e-16. In practice, both ways converge, as the unit test
    # below suggests.
    I = np.eye(A.shape[0])
    k = 1
    while not isPD(A3):
        mineig = np.min(np.real(la.eigvals(A3)))
        A3 += I * (-mineig * k**2 + spacing)
        k += 1

    return A3

def isPD(B):
    """Returns true when input is positive-definite, via Cholesky"""
    try:
        _ = la.cholesky(B)
        return True
    except la.LinAlgError:
        return False
astraw commented 3 years ago

Thanks for the report. I haven't seen this since I updated the code to force the state covariance matrix to be symmetric a couple years ago.

I agree it would be good for this to be very robust. However, I am afraid that catching this error at this point and rerunning after modifying a relevant matrix would make things slower only in certain circumstances and, worse, mask an underlying problem that would be better fixed at the source. The code that returns the error is somewhat decently documented: https://github.com/strawlab/adskalman-rs/blob/main/src/lib.rs#L163-L181 . Given the context in which this occurs, I am wondering if the covariance of the state estimate is limited by machine precision. If this is the case, it seems like tracking could be made more robust by treating the observation as simply missing in such a case. The practical difference between this and your suggested approach is probably minimal.

Do you have any sample data that can be tracked offline that reliably elicit the problem? It will be hard for me to debug this otherwise (my ideas above could certainly be wrong) and it would also be useful to include as a test case in the repository so this error doesn't creep back in.

astraw commented 3 years ago

Also, it is easy to make a test build that switches to the Joseph form of the covariance update which I implemented based on https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/07-Kalman-Filter-Math.ipynb but have never tested in real world use. The relevant code is in the adskalman-rs crate I linked above. This should help with numerical issues that you seem to be facing. I am pushing b455eb816a702bffd6c3ded4ef761b7e66ab6a21 through our build server now and will have a .deb you can test in a bit. You could also compile this yourself. (I suggest to use the scripts in the .gitlab-ci.yml file, for example the braid-pylon-ubuntu2004 section.)

florisvb commented 3 years ago

Unfortunately when we get that error the braidz does not close so there's no data at all. Or at least, so far as we can tell.

If this is indeed caused by the preceding errors that show huge reprojection errors then catching those cases and treating them as missing data might do the trick.

On Tue, Apr 6, 2021 at 2:36 PM Andrew Straw @.***> wrote:

Also, it is easy to make a test build that switches to the Joseph form of the covariance update which I implemented based on https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/07-Kalman-Filter-Math.ipynb but have never tested in real world use. The relevant code is in the adskalman-rs crate I linked above. This should help with numerical issues that you seem to be facing. I am pushing b455eb8 https://github.com/strawlab/strand-braid/commit/b455eb816a702bffd6c3ded4ef761b7e66ab6a21 through our build server now and will have a .deb you can test in a bit. You could also compile this yourself. (I suggest to use the scripts in the .gitlab-ci.yml file, for example the braid-pylon-ubuntu2004 section.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strawlab/strand-braid/issues/3#issuecomment-814454629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPGVKVZ77YB55E7H4HTTHN5GBANCNFSM42PBNOEA .

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

astraw commented 3 years ago

While saving, the data are stored in a .braid folder, which later gets zipped to a .braidz file. Hopefully no data is lost. Another idea is to save data without having loaded a calibration and then re-track it later. Without a calibration, there will be no tracking.

Anyhow, the build is done: https://strawlab.org/tmp/braid-pylon-ubuntu1604-b455eb816a702bffd6c3ded4ef761b7e66ab6a21.zip

florisvb commented 3 years ago

Thanks Andrew.

There is indeed a .braid folder - I can send that to you if it would help with debugging?

I installed this new build, and will keep an eye out for these errors again. They were never consistent, so it'll be hard to know what impact the new build had until we've run experiments for a week or two.

On Tue, Apr 6, 2021 at 4:23 PM Andrew Straw @.***> wrote:

While saving, the data are stored in a .braid folder, which later gets zipped to a .braidz file. Hopefully no data is lost. Another idea is to save data without having loaded a calibration and then re-track it later. Without a calibration, there will be no tracking.

Anyhow, the build is done: https://strawlab.org/tmp/braid-pylon-ubuntu1604-b455eb816a702bffd6c3ded4ef761b7e66ab6a21.zip

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strawlab/strand-braid/issues/3#issuecomment-814495863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPB6WEI7WJIHT6527PDTHOJYFANCNFSM42PBNOEA .

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

astraw commented 3 years ago

Yes, if you send me the .braid folder I should presumably be able to re-run the tracking to the point of the error and then fix things.

florisvb commented 3 years ago

Hey Andrew,

Here's a link to the zipped braid folder, let me know if you can't access for any reason. It's ~560 mb.

https://nevada.box.com/s/aty75d840c0p4ttsemqsyrxinvrdp0xj

On Wed, Apr 7, 2021 at 10:55 PM Andrew Straw @.***> wrote:

Yes, if you send me the .braid folder I should presumably be able to re-run the tracking to the point of the panic and then fix things.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strawlab/strand-braid/issues/3#issuecomment-815470031, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPFZO6D7EKZDZDKN2CDTHVANNANCNFSM42PBNOEA .

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

astraw commented 3 years ago

Thanks.

Unfortunately it seems that due to the crash, the tracking parameters you were using were not saved. (I should now have just fixed this in f08aea04209ba50e7ac6c71c6a342c666a19e434).

If I use the default tracking parameters instead, I do not get the error (although I am testing on the main branch, not the 0.9.0 release).

Can you send me the tracking parameters you were using? This would be motion_noise_scale, initial_position_std_meters and so on.

Otherwise, we can just wait for your results when testing using the Joseph form.

florisvb commented 3 years ago

here's the parameters we were using. If any of them seem way off from what you would expect we can try others too, these just seem to be working, but we've improved the calibration itself since choosing these parameters, so they could potentially be tweaked

[mainbrain.tracking_params] accept_observation_min_likelihood = 1e-14 ekf_observation_covariance_pixels = 30 initial_position_std_meters = 0.1 initial_vel_std_meters_per_sec = 1.0 max_position_std_meters = 0.05 motion_noise_scale = 2.0

[mainbrain.tracking_params.hypothesis_test_params] hypothesis_test_max_acceptable_error = 5.0 minimum_number_of_cameras = 2 minimum_pixel_abs_zscore = 0.0

On Thu, Apr 8, 2021 at 2:59 PM Andrew Straw @.***> wrote:

Thanks.

Unfortunately it seems that due to the crash, the tracking parameters you were using were not saved. (I should now have just fixed this in f08aea0 https://github.com/strawlab/strand-braid/commit/f08aea04209ba50e7ac6c71c6a342c666a19e434 ).

If I use the default tracking parameters instead, I do not get the error (although I am testing on the main branch, not the 0.9.0 release).

Can you send me the tracking parameters you were using? This would be motion_noise_scale, initial_position_std_meters and so on.

Otherwise, we can just wait for your results when testing using the Joseph form.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strawlab/strand-braid/issues/3#issuecomment-816255876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPAGQZ2XU7DWB72HBATTHYRKNANCNFSM42PBNOEA .

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

astraw commented 3 years ago

I can reproduce the error with your data and parameters. And it seems indeed that the Joseph form of the covariance update method indeed solves this error. I will do some more testing but then will likely switch Braid to use this in the next release. Have you had a chance to test it yourself?

Do you mind if I take an excerpt of this data and use it in the Braid automated test suite to make sure this error doesn't inadvertently crop up again?

florisvb commented 3 years ago

That's great! I can confirm that we've run 2-3 experiments since switching to the Joseph form package and have not had the error crop up. We even did have one event where we got the high reprojection errors but the covariance error did not crop up, so braid did not crash, which seemed promising to me.

Yes, no problem to use an excerpt of that data in the testing, glad it turned out to be useful!

On Fri, Apr 9, 2021 at 2:42 PM Andrew Straw @.***> wrote:

I can reproduce the error with your data and parameters. And it seems indeed that the Joseph form of the covariance update method indeed solves this error. I will do some more testing but then will likely switch Braid to use this in the next release. Have you had a chance to test it yourself?

Do you mind if I take an excerpt of this data and use it in the Braid automated test suite to make sure this error doesn't inadvertently crop up again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strawlab/strand-braid/issues/3#issuecomment-816987223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPFBUDMV4WZ5UY63UVTTH5YDJANCNFSM42PBNOEA .

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/

astraw commented 3 years ago

Sounds good. Have you looked at the data from the recent experiments with the Joseph form and notice anything worth noting (other than "didn't crash")?

I will inspect a few sample datasets here after retracking with the Joseph form. Labbe's discussion of it is pretty convincing from a theoretical level that it is a better approach than the current one. If this all looks good, I'll switch Braid to use the Joseph form.

astraw commented 3 years ago

After testing a bit with the Joseph form of Kalman updating, I see see some minor differences during situations where tracking is difficult but very similar results when the tracking is easy. Given the increased robustness with this approach, I have switched Braid to use it in 4501cfd9b18e4ff7a8046cc282a9b3d5b6cee739. And incorporated a snippet from your data in a new test in d00c270944eefbf31ff53876e25ac68fbf2380cd. So, the next release of Braid will get this and I added a small blurb in the Changelog 621a4bfafe8a311e488a5eb54e5c62ab38b6b4d9.

Thanks for reporting this. I hope Braid is uncrashable now! :)

florisvb commented 3 years ago

Thanks Andrew! I can confirm that my student did not find any obvious changes in tracking quality with the Joseph form, consistent with your tests.

On Mon, Apr 12, 2021 at 6:26 AM Andrew Straw @.***> wrote:

After testing a bit with the Joseph form of Kalman updating, I see see some minor differences during situations where tracking is difficult but very similar results when the tracking is easy. Given the increased robustness with this approach, I have switched Braid to use it in 4501cfd https://github.com/strawlab/strand-braid/commit/4501cfd9b18e4ff7a8046cc282a9b3d5b6cee739. And incorporated a snippet from your data in a new test in d00c270 https://github.com/strawlab/strand-braid/commit/d00c270944eefbf31ff53876e25ac68fbf2380cd. So, the next release of Braid will get this and I added a small blurb in the Changelog 621a4bf https://github.com/strawlab/strand-braid/commit/621a4bfafe8a311e488a5eb54e5c62ab38b6b4d9 .

Thanks for reporting this. I hope Braid is uncrashable now! :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strawlab/strand-braid/issues/3#issuecomment-817810131, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB4EPDB432ML3AJTVZRHC3TILYIPANCNFSM42PBNOEA .

-- Floris van Breugel | http://www.florisvanbreugel.com Assistant Professor of Mechanical Engineering & Graduate Program for Neuroscience University of Nevada, Reno

Wildlife and Landscape Photography Galleries: http://www.ArtInNaturePhotography.com/ Blog: http://www.ArtInNaturePhotography.com/wordpress/