Open jgmbenoit opened 6 years ago
I haven't implemented the reweighting in (3.11) so that's one possible source of the discrepancy. The calculation of the D value of the KS test is here -- feel free to poke around and let me know if you find something suspicious. The p-value is then simply calculated by generating artificial samples from the fitted power-law distribution, and comparing the D values obtained from the artificial samples with the D value of the real sample.
Okay, from where comes the implemented formula: fabs( 1 - hzeta(alpha, x) / hzeta(alpha, xmin) - m / n) ?
Sorry for the late reply - lots of things to be done at work. Anyway, the test statistic of the one-sample KS test is simply the maximum of the absolute value of the difference between the "theoretical" CDF and the observed CDF. In the formula above, m / n
is the observed CDF (n
is the number of samples, m
counts the number of samples less than x
, while x
iterates over the sorted list of samples). The remaining part (i.e. 1 - hzeta(alpha, x) / hzeta(alpha, xmin)
) should then be the value of the CDF of the power-law function at x
if the power-law behaviour starts at xmin
and has an exponent alpha
.
I fitted some (discrete) data against plfit provided here and the matlab code provided by the authors of [1]: I obtain values for p that differ significantly: grossly the p-values obtain with plfit are 10 times smaller. For the attached data file sample_deglist.txt: $ plfit -b -p exact sample_deglist.txt gives $ sample_deglist.txt: D 2.32465 3 -6150.54 0.0155189 0.028 So p is 0.028 With the matlab code, I get $ sample_deglist.txt: D 2.32000 3 -6150.56 0.126800 [I run the matlab code with octave 4.2.1.] Any idea ? Otherwise, have you implemented formula (3.11) in [1], or something else ?