h3ik0th / DistFitter

SciPy distribution fitting
9 stars 4 forks source link

low KS p-values for small domains #1

Open annehulsey opened 2 years ago

annehulsey commented 2 years ago

Hello. First, this tool is amazing, thank you so much for it.

I am trying to fit a distribution with a really small domain, which means the pdf values have 6 digits. I think this is causing an issue with the kstest because some of the fits look great visually but have quite small ks values. Do you have any suggestions for this? image

h3ik0th commented 2 years ago

Hello Anne, do the x values only cover a very small subset of the domain? So the actual data represent only a section of one tail of the distribution curve? You could try the Anderson-Darling test, which is more focused on the tails of the distribution whereas KS pays more attention to the center.

You could also consider to fit truncated distribution functions, in which the x values only cover the x-axis between the observed or estimated minimum and maximum. So the test does not look for a long tail that may not be present in the actual data. Truncated distribution - Wikipedia https://en.wikipedia.org/wiki/Truncated_distribution

In the distribution equation, I'd use a location and scale parameter. The test may find a better alignment with a shifted or scaled distribution.

If your x values do cover much of the domain and not only a tail section, then I'd try to multiply the x-values by a scale factor, e.g. 1mil. As if you measure your x values not in kilometers but in millimeters, to see if the test no longer struggles with numerical precision issues.

On Thu, Oct 13, 2022 at 11:21 AM Anne Hulsey @.***> wrote:

Hello. First, this took is amazing, thank you so much for it.

I am trying to fit a distribution with a really small domain, which means the pdf values have 6 digits. I think this is causing an issue with the kstest because some of the fits look great visually but have quite small ks values. Do you have any suggestions for this? [image: image] https://user-images.githubusercontent.com/28743653/195500552-3bc58b94-8f5e-4ca6-a556-08f1331ed83f.png

— Reply to this email directly, view it on GitHub https://github.com/h3ik0th/DistFitter/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVFP3TGB3PNOMXAO3ZMOZLWC6E35ANCNFSM6AAAAAARD35ETU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

annehulsey commented 2 years ago

Thank you, this gave great insights into a variety of issues. (Btw, before seeing your response, I found the Bayesian Information Criterion, which I used and seems to fit my purpose a bit better.)

Your comments also helped me think about why I am looking for a fitted distribution in the first place. In my use case, it may actually be better to use a smoothed version of the empirical pdf, rather than finding a parametric fit.