Update kalmanfilter.py for lower memory usage

winedarksea commented 9 months ago

Hi again, working with large matrices (2000 x 30000) I ran into some memory issues on computers with less RAM. I went looking for some quick fixes and found a few:

The easiest change here is removing 1*matrix operations that were resulting in an unnecessary copy. I think the original intention was just for clarity, and removing these saves a surprising amount of RAM. This change had no effect on predictions in my test.
had to add a change to allow passing covariances=False to .smooth(). Code was written to expect self.cov=None but was never set to None. My guess is you've always been testing with covariance used.
I set the new default for .smooth() to covariance = False. This reduced max memory for a sample from max: 6.678 GB to max: 4.325 GB and I don't see any need for the covariance here (unlike in predict where cov is useful). But you might disagree if you have some use for that.
Added a comment about using float32 instead of float64 for the np.empty call. Unsurprisingly, this cuts memory use in half. However it comes with a small change in calculated values (by 4.8043524443673274e-08% in my case, so really really small). As a result I made this change in my personal version of the code but have not pushed it to master, just added a comment for anyone else who looks into this.

Overall in a test case this brings memory use from 17 GB down to 7 GB, (and down to 4 GB with float32 change) measured using scalene profiler.

oseiskar commented 9 months ago

Thank you again. This is great! I merged modified version https://github.com/oseiskar/simdkalman/commit/9507530d3703ba7311e43f1d9d61294e006d409e, which reverted this part:

I set the new default for .smooth() to covariance = False. This reduced max memory for a sample from max: 6.678 GB to max: 4.325 GB and I don't see any need for the covariance here (unlike in predict where cov is useful). But you might disagree if you have some use for that.

I agree that this would be a more reasonable default since accessing the smoothed covariance is not that often of interest and consumes a lot of memory. However, changing the default breaks the API (and hence the testsuite, see the CI runs in https://github.com/oseiskar/simdkalman/pull/26) so I'll keep the default. However I added a comment about lower memory usage to the docs.

The easiest change here is removing 1*matrix operations that were resulting in an unnecessary copy. I think the original intention was just for clarity, and removing these saves a surprising amount of RAM. This change had no effect on predictions in my test.

I double-checked this after merging (whoops) and noticed that this was not just for clarity, but to ensure that compute can be used to simultaneously return filtered and smoothed results (the usefulness of this is questionable, but it's also a part of the API that used to work). I added a test for this https://github.com/oseiskar/simdkalman/commit/12a8926a187970b23293f343ae1353db2f788a6d, which now failed and I fixed by modifying the code like this https://github.com/oseiskar/simdkalman/commit/9ae4d3c8d7ce9e71a65617ea0ac7deebc9dc24d5 , which should also achieve the lower memory consumption in the default smoothing case where filtered=False.

Added a comment about using float32 instead of float64 for the np.empty call. Unsurprisingly, this cuts memory use in half. However it comes with a small change in calculated values (by 4.8043524443673274e-08% in my case, so really really small). As a result I made this change in my personal version of the code but have not pushed it to master, just added a comment for anyone else who looks into this.

Supporting float32 as an option would be a great addition, since it works just as well with most Kalman Filters. However, there are also applications where it does not work. One example are the very complicated EKFs used in visual-inertial odometry, e.g., variants of this, which are normally implemented so that they are just barely numerically stable in double precision.

If a float32 mode was added, it should also systematically change all means, covariances and intermediate results from double to float, to avoid back-and-forth float-double conversions, which can be slow. Changing just the covariance works OK and reduces memory consumption, but changing the other parts too could make the code run even faster.

oseiskar commented 8 months ago

Released the merged code in v1.0.4. Also made an issue (enhancement) mentioning the need for a 32-bit float mode: https://github.com/oseiskar/simdkalman/issues/28

oseiskar / simdkalman

Update kalmanfilter.py for lower memory usage #25