jeffalstott / powerlaw

600 stars 132 forks source link

how to limit range of xmin to increase the fitting speed #72

Closed boxcwang closed 5 years ago

boxcwang commented 5 years ago

Dear developer, I have been using the package to fit my data, it works great! However, there is one small issue, when the number data points get larger, the time needed to fit xmin increases dramatically. I am currently fitting a distribution of about 5x10^6 data points, and the fitting process has been running for more than 27 hours on our 36 core server, and it is still not finished.

I was wondering if I can give a smaller range to limit where the xmin is, the fitting should be much faster.

Not sure how to do it. Thank you for your help in advance. If this issue is mentioned somewhere and I missed it, sorry for that.

Cheers

Yuan

jeffalstott commented 5 years ago

From the paper:

The search for the optimal xmin can also be restricted to a range, given as a tuple or list: fit = powerlaw.Fit(data, xmin=(250.0, 300.0))

This is actually a failing in the code documentation, which for Fit says:

xmin : int or float, optional

Oops!

This will likely speed up your code. powerlaw by default tries a fit for every possible value of xmin, which is every unique value in the dataset. That could be as many 5x10^6 data points for you. If you make the xmin range small, you'll should see speedups.