ntamas / plfit

Fitting power-law distributions to empirical data, according to the method of Clauset, Shalizi and Newman
GNU General Public License v2.0
47 stars 17 forks source link

Set p_value_method from python #9

Closed enavarro222 closed 11 years ago

enavarro222 commented 11 years ago

Here again with something strange with the python wrapper :-)

In [1]: import plfit
In [2]: opts = plfit.plfit_discrete_options_t()
In [3]: opts.p_value_method = plfit.PLFIT_P_VALUE_EXACT
In [4]: res = plfit.plfit_discrete([45,4,4,4,4,4,1,1,1,1], opts)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-97f19fff7330> in <module>()
----> 1 res = plfit.plfit_discrete([45,4,4,4,4,4,1,1,1,1], opts)

RuntimeError: Invalid value

However it works with plfit.PLFIT_P_VALUE_APPROXIMATE and plfit.PLFIT_P_VALUE_SKIP (I got a p value in the first case, and "nan" in the second case)

+

ntamas commented 11 years ago

You need to specify a desired p-value precision for the EXACT method:

opts.p_value_precision = 1e-2

I agree that something like this should be the default so I'll fix it soon.

enavarro222 commented 11 years ago

Great ! it works. So the precision doesn't mater for the approximate method.

Thanks again !

ntamas commented 11 years ago

Actually, the approximate method isn't too reliable so I'd either use the exact method or skip the p-value calculation entirely if I were you. The precision matters for the exact method because it determines how many synthetic instances the method generates from the input data. For a precision of "eps", the method generates eps^-2 / 4 synthetic instances and runs the fitting on each of them.

enavarro222 commented 11 years ago

ok, thanks for the advice ! Indeed when I used your package for the first time, this sumer, I think I remember that p values where not reliable. So I wrote a small python function that compute p (using multiprocessing package). But your C implementation (even sequential) may be faster. I'll test !