Closed chucklesoclock closed 2 years ago
Glad you like the code. I hope it is helpful to you. In KDEpy, the bandwidth h is the standard deviation σ of the kernel function.
For instance, a KDE on a single data point at x=0 using Gaussian would give you a N(0, 1) distribution. But I haven't looked at sklearn and scipy in a while, so I forgot how they interpret bandwidth. My advice would be to check their implementations and test on some fake data to be sure you get the relationship right.
Hello!
What an efficient and useful library you have here! I was looking through the code and must admit I was defeated by this question:
What is the relationship between your calculated
kde.bw
bandwidth value and scikit-learn and scipy's? For example, scipy and sklearn is related in that the following invocations are equivalent (up to minor differences in implementation):That is,
scipy_bw = sklearn_bw / x.std(ddof=1)
. Do you have the relationship offhand? Otherwise I can do some experiments.Thanks for all your work on the library! Especially Improved Sheather-Jones bandwidth selection, I'm not sure that exists elsewhere in Python.