tommyod / KDEpy

Kernel Density Estimation in Python
https://kdepy.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
584 stars 90 forks source link

kde.evaluate for density plot #159

Open chum1ngo opened 1 year ago

chum1ngo commented 1 year ago

Hello, I'm trying to display several distributions in the same plot. For that I need to estimate the kde of those distributions, and then evaluate each one of them in the same space. I ilustrated what I intended to do for 1 of those distributions in the first code section below with scipy.

import scipy.stats as st
kde = st.gaussian_kde(np.linspace(-10, 10, num=10000))
kde.pdf([1,2,3,4,5,6,7,8,9,10])

Then im trying to do the same with KDEpy, but then I'm getting an error: Every data point must be inside of the grid.

from KDEpy import FFTKDE
kde = FFTKDE(bw='silverman', kernel='gaussian').fit(np.linspace(-10, 10, num=10000))
kde([1,2,3,4,5,6,7,8,9,10])

I'm not sure if this is some kind of bug because the error doesn't make a lot of sense to me or I just missunderstood how to use the methods. Is it possible evaluate points in the kde like that?

Regards

tommyod commented 1 year ago

Your grid needs to be wider than your data points. If your data is in the range [-10, 10], and you use a kernel with some width (e.g. a standard normal), then you need to define a grid on e.g. [-15, 15]. You can always chop the grid after evaluating the KDE. But think about what you really want - would you use a histogram on [1, 10] to evaluate data on [-10, 10]?

There are two reasons why "Every data point must be inside of the grid."