sktime / skpro

A unified framework for tabular probabilistic regression and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
232 stars 45 forks source link

[ENH] Implement Histogram Conditional Density Estimation #322

Open ShreeshaM07 opened 4 months ago

ShreeshaM07 commented 4 months ago

Describe the solution you'd like

Histogram estimation is not present in skpro. Implement them from scratch using the conditional density estimate finding the optimal binwidth(h) and find the function that fits the histograms most aptly without over smoothing nor undersmoothing.

Additional context

Useful resources

ShreeshaM07 commented 4 months ago

@fkiraly Do you recommend any other resources to refer for implementing this?

Also do I have to implement the Kernel Density Estimation for Gaussian, tophat for this histogram?

fkiraly commented 4 months ago

Sure!

Some classical ones:

This is mostly kernel based.

Also, what is tophat?

ShreeshaM07 commented 4 months ago

Also, what is tophat?

Its a type of kernel in sklearn KDE the K(x,h) is proportional to 1 for x<h.

It mostly resembles a bin itself.

fkiraly commented 4 months ago

Oh, I see, the "top-hat kernel", which is the same as a box kernel. It corresponds to a uniform distribution.

Here are a few things I noticed: