ntamas / plfit

Fitting power-law distributions to empirical data, according to the method of Clauset, Shalizi and Newman
GNU General Public License v2.0
47 stars 17 forks source link

Regarding power law exponent #22

Closed Alexander-Shukaev closed 4 years ago

Alexander-Shukaev commented 4 years ago

Hi, nice library! This is a question rather than an issue if I may. Did you guys figure out which power law exponent does formula (3.2) estimate in the end? I mean, is it of PDF or CDF of the supplied data samples? Of course their difference would be 1. The background of the task is to estimate the exponent and then to plug random samples from the same data distribution into (2.6) to find out their actual probabilities. So far, as I read the document, it looks like power law exponent estimator (3.2) corresponds to PDF rather than CDF. Am I right? Thanks and take care.

@ntamas @jgmbenoit

ntamas commented 4 years ago

I only have access to the preprint at the moment and formula (3.2) in the preprint does not seem to estimate a power law exponent at all. Can you check the preprint as well and let me know which formula you mean?

Alexander-Shukaev commented 4 years ago

Right, sorry, what I meant was formula (3.1). AFAIU, it is supposed to estimate the alpha from (2.2), regardless of the fact that in (3.1) we actually plug our data samples directly. I later on need to use this alpha in (2.6) to estimate probabilities of other data samples from the same distribution. As we plug our data samples directly (without preprocessing) into (3.1), intuitively I had a doubt that it is more likely to give exponent of CDF (2.6) instead, i.e. authors meant a^ ~ a - 1, where a^ is the result of (3.1). But I might be wrong and actually a^ ~ a, which again means that it estimates exponent of PDF (2.2). That was the question.

ntamas commented 4 years ago

I think it estimates the exponent of the PDF. One reason that I think this is true is because the theoretical minimum of the formula in (3.1) is 1 (for instance, for an extreme case where one of your samples is 1 and all the remaining samples tend to infinity). If it were the exponent of the CDF, it would mean that the theoretical minimum of the exponent of the PDF is zero, which cannot hold since for \alpha < 1 the PDF is not normalizable. (See also the parenthesized comment in the preprint at the bottom of page 5).