tommyod / KDEpy

Kernel Density Estimation in Python
https://kdepy.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
585 stars 90 forks source link

Small probabilities beyond 16 decimal places? #106

Closed nicolasdarmanthe closed 2 years ago

nicolasdarmanthe commented 2 years ago

I've noticed that the lower bound of probabilities plateaus out around 10^-16 (in both FFTKDE and Naive KDE). I think this has something to do with https://en.wikipedia.org/wiki/Decimal64_floating-point_format but am not sure how to fix.

tommyod commented 2 years ago

I'm not sure what a fix would mean. There's a limit to floating decimal precision. This is true for all numerical algorithms. Since a kernel density is an estimate of an unknown pdf in the first place, I don't see how more precision would help in practice. Could you tell me more about why such high resolution is useful?

nicolasdarmanthe commented 2 years ago

10^-16 is a small number, but not that small by computer standards? In my use case I am very interested in the tails of the pdf. I compute the log of the inverse pdf. In the tails, I'm getting a ceiling around 37 (= ln(1/10^-16) ).

On Wed, 9 Feb 2022, 17:51 Tommy, @.***> wrote:

I'm not sure what a fix would mean. There's a limit to floating decimal precision. This is true for all numerical algorithms. Since a kernel density is an estimate of an unknown pdf in the first place, I don't see how more precision would help in practice. Could you tell me more about why such high resolution is useful?

— Reply to this email directly, view it on GitHub https://github.com/tommyod/KDEpy/issues/106#issuecomment-1033408327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIL7JVWEUFFSFVWM3H7Y45LU2IFH5ANCNFSM5N4IJG2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

tommyod commented 2 years ago

I see. I think you should roll your own implementation then. For two reasons:

Both of these things (and probably other numerical issues) means that KDEpy is probably not well-suited for your use case. If I were you I would probably roll my own naive implementation, placing a gaussian kernel (or whatever) on each data point and going from there.

I'll close this issue, but feel free to ask if you have more questions.