polsys / ennemi

Easy Nearest Neighbor Estimation of Mutual Information
https://polsys.github.io/ennemi/
MIT License
35 stars 12 forks source link

Reconsider the eps-(1e-12) tweak in algorithms #76

Open polsys opened 4 years ago

polsys commented 4 years ago

The MI estimation code uses the KD tree like

x_grid.query_ball_point(x, eps - 1e-12, p=np.inf, return_length=True)

Here the eps - 1e-12 tweak is necessary because cKDTree returns points with distance less than or equal to the parameter, while the algorithm expects strictly smaller distance. This tweak does the job well enough, but I'm concerned that it might be brittle in some (extreme) cases.

There are practically four options:

  1. Ignore. Because we expect unit variance data, there should be no floating point precision loss. There would be misbehavior only at some singular points, but we support that case only implicitly.
  2. Use a better tweak: find the next smallest floating point value. Robust, but might be too technical; needs testing for edge cases.
  3. Wait for cKDTree to allow strict inequality. Not practical because we would need to wait ~2 years before requiring that SciPy version.
  4. Use Algorithm 2 of Kraskov et al., which uses non-strict inequality. Might affect results in some cases, needs verification.

I would prefer Option 1, unless any issues crop up. Filing this issue so that the question is documented; I'll revisit the decision later, and then update the code accordingly.