tommyod / KDEpy

Kernel Density Estimation in Python
https://kdepy.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
584 stars 90 forks source link

Failure to evaluate high dimensional data #167

Open Leon-Noirclerc opened 4 months ago

Leon-Noirclerc commented 4 months ago

Hello everyone, I am trying to evaluate a fitted FFTKDE on high dimensional data ( dimension greater than 10), but this fails with an AssertionError. The error comes from the method autogrid in KDEpy/utils.py. When passing an empty grid it calls:

 if num_points is None:
    num_points = [int(np.power(1024, 1 / dims))] * dims

Which is a list of 1 for dims greater than 10 and will trigger the assert points >= 2 on line 126. However in the feature summary of your documentation I saw that the supported number of dimensions is set to Any. Should I evaluate the FFTKDE differently on high dimensional data ? What would be your recommendation here ? Thanks !

tommyod commented 4 months ago

Create a custom grid, or give a num_points that is large enough.

Beware that FFTKDE discretizes the data onto the grid, and in high dimension many grid points might be empty (no data point is close to it) so you need a very fine grid in order to avoid discretization errors. Using KDEpy.TreeKDE or even scipy.stats.gaussian_kde might be more reasonable.