Closed rhettinger closed 9 months ago
This looks like a pretty clean useful addition. Nice draft!
Some things to consider (better explored in draft PR form than inline in an issue draft): kernel_function=
being a str
is perhaps unusual - could that instead be a direct reference to a Callable[float, float]
that also carries a .support: float|None
attribute? Those could be pre-defined by name perhaps within a statistics.kernel
-ish namespace to avoid the match/case and on the fly definitions entirely. This allows people to also bring their own kernel rather than only use our predefined common ones.
Thanks for the compliment on the draft. I've been iterating on it for a few weeks.
FWIW, having a string kernel name is consistent with the API style for quantiles() and correlation() elsewhere in the module. I also like having kde() function fully self-contained, making it easier to use. The listed methods are consistently the only ones mentioned in KDE literature and SciPy only implements the normal kernel since it dominates the others in practice. So I think supporting custom kernels would be an over-generalization beyond what people actually use.
Side note: While the code volume makes it look like the kernel is the most important part, the referenced paper says that changing kernels only makes minor differences in the output. The significant parameter by far is h which has dramatic effects on the final result.
Feature
Proposal:
I propose promoting the KDE statistics recipe to be an actual part of the statistics module.
See: https://docs.python.org/3.13/library/statistics.html#kernel-density-estimation
My thought Is that it is an essential part of statistical reasoning about sample data See: https://www.itm-conferences.org/articles/itmconf/pdf/2018/08/itmconf_sam2018_00037.pdf
Discussion of this feature:
This was discussed and reviewed with the module author who gave it his full support:
Working Draft:
Linked PRs