p-values: precision - Githubissues

ntamas / plfit

Fitting power-law distributions to empirical data, according to the method of Clauset, Shalizi and Newman

GNU General Public License v2.0

47 stars 17 forks source link

p-values: precision #21

Open jgmbenoit opened 6 years ago

jgmbenoit commented 6 years ago

I fitted some (discrete) data against plfit provided here and the matlab code provided by the authors of [1]: I obtain values for p that differ significantly: grossly the p-values obtain with plfit are 10 times smaller. For the attached data file sample_deglist.txt: $ plfit -b -p exact sample_deglist.txt gives $ sample_deglist.txt: D 2.32465 3 -6150.54 0.0155189 0.028 So p is 0.028 With the matlab code, I get $ sample_deglist.txt: D 2.32000 3 -6150.56 0.126800 [I run the matlab code with octave 4.2.1.] Any idea ? Otherwise, have you implemented formula (3.11) in [1], or something else ?

ntamas commented 6 years ago

I haven't implemented the reweighting in (3.11) so that's one possible source of the discrepancy. The calculation of the D value of the KS test is here -- feel free to poke around and let me know if you find something suspicious. The p-value is then simply calculated by generating artificial samples from the fitted power-law distribution, and comparing the D values obtained from the artificial samples with the D value of the real sample.

jgmbenoit commented 6 years ago

Okay, from where comes the implemented formula: fabs( 1 - hzeta(alpha, x) / hzeta(alpha, xmin) - m / n) ?

ntamas commented 6 years ago

Sorry for the late reply - lots of things to be done at work. Anyway, the test statistic of the one-sample KS test is simply the maximum of the absolute value of the difference between the "theoretical" CDF and the observed CDF. In the formula above, m / n is the observed CDF (n is the number of samples, m counts the number of samples less than x, while x iterates over the sorted list of samples). The remaining part (i.e. 1 - hzeta(alpha, x) / hzeta(alpha, xmin)) should then be the value of the CDF of the power-law function at x if the power-law behaviour starts at xmin and has an exponent alpha.