Open tommyod opened 6 years ago
The kernel stuff is difficult. I lost already two contributors because I wasn't able to keep up or decide on design and backwards compatiblity decisions.
Essentially, attempts to refactor the current code needs considerable backwards compatibility breaks, and the current design is not very good for providing slow exact and fast approximate/interpolating results. (I moved largely away from trying to squeeze everything into a single class.)
If you like to work on this besides "cosmetic" changes, then I would suggest starting with #2318. It can be merged after a rebase and brief checking for 0.10 (after 0.9) branching. It would get an experimental label for maybe two years, and it's possible to use this time to incorporate changes for things that don't work well enough. We wouldn't have to look out for backwards compatibility with existing code during this time.
It would be great if you could pick this up, because it has been a lingering sore point for quite some time.
The new code does not cover kernel regression. I started some separate work on using binning similar to the fft version for kde also in kernel regression. #3492 (I ended up at that PR because of some partially related work variance and variance function estimation.)
I would like to help cleaning up the code related to kernel density estimation, and if possible also improve it. Is this of any interest, @josef-pkt ? If so, some help getting started would be great!
General observations
Some general observations.
statsmodels/nonparametric
package has not gotten much attention recently.statsmodels/sandbox/nonparametric
package too, it has also received little attention.statsmodels
KDE's are slower than scipy/sklearn.statsmodels/nonparametric
is mixed. Some modules appear OK, but some appear unused. There are many PEP8 violations. There are 22 TODOs in the code which might indicate starting points for improvements (11 instatsmodels/sandbox/nonparametric
too).Suggested improvements
I would like to help improve the quality of code associated with KDE. My ideas are, in order of least-to-most ambitious:
Questions and advice
I feel like I should add a few sentences about myself. I've been programming in Python for 3-4 years, have a degree in math, and I am interested in learning more about statistics/numerics/programming through a project of suitable difficulty. Contributing to statsmodels seems like a great place to start. I hope you will be positive to my ideas and suggestions, and would appreciate any feedback on the above.
-Tommy