josemiotto / pylevy

Levy distributions for Python
GNU General Public License v3.0
34 stars 18 forks source link

Confidence intervals for fit parameters? #9

Closed zbpvarun closed 4 years ago

zbpvarun commented 5 years ago

Hi, I have been using your fit_levy function to fit stable distributions to some of my data. I was wondering if you knew of any way to also obtain confidence intervals for the parameter fits. I have been browsing other scipy optimize documentation and have not so far found anything concrete. Thanks.

josemiotto commented 5 years ago

Hi. I recommend to do it "empirically", meaning, to bootstrap. So, resample your data with replacement N times, and estimate the parameters, then take the confidence interval from those N sets of parameters.

Example:

# x is the data
parameters = fit_levy(x)
N=200
bootstrap_parameters = []
for i in range(N):
    xr = np.random.choice(x, size=len(x), replace=True)
    pr = fit_levy(xr)
    bootstrap_parameters.append(pr)
lower = [sorted([x[i] for x in bootstrap_parameters])[int(N*0.05)] for i in range(4)]
upper = [sorted([x[i] for x in bootstrap_parameters])[int(N*0.95)] for i in range(4)]

This is a general way to get CIs for any parameters' set. Now, it depends on the size of the data, but this think may take time, like, a lot. An alternative to bootstrap may be to approximate the CI by using the Cramer-Rao bound (check Wikipedia), which can also be tricky.

Then you have a general problem of multivariate estimation, which is that the estimators of each parameter may be correlated, i.e. the covariance matrix of the estimators is not diagonal. You may think it just by considering two parameters: you want to represent the area inside the CI, but it is typically NOT a rectangle, but an ellipse, which may be not laying necessarily on the axis.

The docs for optimize are useless, because it's only a method to fit the data, but other functions could in principle be used.