Here the eps - 1e-12 tweak is necessary because cKDTree returns points with distance less than or equal to the parameter, while the algorithm expects strictly smaller distance. This tweak does the job well enough, but I'm concerned that it might be brittle in some (extreme) cases.
There are practically four options:
Ignore. Because we expect unit variance data, there should be no floating point precision loss. There would be misbehavior only at some singular points, but we support that case only implicitly.
Use a better tweak: find the next smallest floating point value. Robust, but might be too technical; needs testing for edge cases.
Wait for cKDTree to allow strict inequality. Not practical because we would need to wait ~2 years before requiring that SciPy version.
Use Algorithm 2 of Kraskov et al., which uses non-strict inequality. Might affect results in some cases, needs verification.
I would prefer Option 1, unless any issues crop up. Filing this issue so that the question is documented; I'll revisit the decision later, and then update the code accordingly.
The MI estimation code uses the KD tree like
Here the
eps - 1e-12
tweak is necessary becausecKDTree
returns points with distance less than or equal to the parameter, while the algorithm expects strictly smaller distance. This tweak does the job well enough, but I'm concerned that it might be brittle in some (extreme) cases.There are practically four options:
cKDTree
to allow strict inequality. Not practical because we would need to wait ~2 years before requiring that SciPy version.I would prefer Option 1, unless any issues crop up. Filing this issue so that the question is documented; I'll revisit the decision later, and then update the code accordingly.