jeffalstott / powerlaw

603 stars 134 forks source link

New user: Why the curvature in power_law.plot_ccdf fit? #85

Open amybug opened 3 years ago

amybug commented 3 years ago

Hi Dr. Alstott (Jeff): Thank you for this great suite of codes! We have a cCDF from simulation and want to fit to a power law. I am attaching what we see. We used a few lines from your great 2014 tutorial with Bullmore and Plenz. I suppose the fit package wants to normalize and so it matches our data at x_min ... which happens to be around 0.1. So be it. Can you explain why the method fit.power_law.plot_ccdf() draws the dashed line with curvature at large x values? Shouldn't it be a straight line?

Very grateful for your time and expertise! -Amy Graves, Prof. of Physics, Swarthmore College fit = powerlaw.Fit(all_lags_cyan, xmax = 1.0) fig1 = fit.plot_pdf(color= 'k', linewidth=2) fit1=fit.plot_ccdf(color='cyan', linewidth=2, ax = fig1) x, y = results_cyan.ccdf() plt.plot(x, y, 'o', color='cyan') fit.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig1)

Screen Shot 2021-07-29 at 2 51 25 PM
jeffalstott commented 3 years ago

It's a function of having an xmax. Look to Figure 3 in the paper. Thanks for using powerlaw!

On Thu, Jul 29, 2021 at 2:54 PM amybug @.***> wrote:

Hi Dr. Alstott (Jeff): Thank you for this great suite of codes! We have a cCDF from simulation and want to fit to a power law. I am attaching what we see. We used a few lines from your great 2014 tutorial with Bullmore and Plenz. I suppose the fit package wants to normalize and so it matches our data at x_min ... which happens to be around 0.1. So be it. Can you explain why the method fit.power_law.plot_ccdf() draws the dashed line with curvature at large x values? Shouldn't it be a straight line?

Very grateful for your time and expertise! -Amy Graves, Prof. of Physics, Swarthmore College fit = powerlaw.Fit(all_lags_cyan, xmax = 1.0) fig1 = fit.plot_pdf(color= 'k', linewidth=2) fit1=fit.plot_ccdf(color='cyan', linewidth=2, ax = fig1) x, y = results_cyan.ccdf() plt.plot(x, y, 'o', color='cyan') fit.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig1) [image: Screen Shot 2021-07-29 at 2 51 25 PM] https://user-images.githubusercontent.com/4631628/127549493-8489afd6-c288-4cc0-82cf-17343776bf9a.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7L7VZAKOJFFVX25A7ITT2GPXTANCNFSM5BHBPSQA .

amybug commented 3 years ago

Dear Jeff, Thank you so much for the quickness of your reply. You must be super busy responding to users. Hopefully this is a "good" busy - indicative of authoring such a useful body of work!

Duh ... ofc Figure 3! I deleted all data with x > xmax and refit, not specifying an xmax. The fitted xmin and alpha did not change, but fit.power_law.plot_ccdf() produced a straight line. Cool!

One more question, please? I attach the new image. The line (red dashed) is just a bit troubling. Is this really a best fit line given the constraint that cCDF(xmin) = 1.0? I "freehanded" in a yellow dashed line. If I were a ftting routine, I would probably come up with the yellow dashed line instead. I asked Matlab for a good old linear least squares fit and it agreed (attached).

Can you tell us why the red dashed line, which has a slightly steeper slope (alpha = 3.98 from powerlaw package vs. alpha = 3.66 from Matlab) is preferred by your "goodness of fit" criteria?

Thank you so very, very much!

Take care, Amy

[image: Screen Shot 2021-07-30 at 4.37.20 PM.png] [image: Screen Shot 2021-07-30 at 5.01.54 PM.png]

On Thu, Jul 29, 2021 at 6:16 PM Jeff Alstott @.***> wrote:

It's a function of having an xmax. Look to Figure 3 in the paper. Thanks for using powerlaw!

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

jeffalstott commented 3 years ago

Hi Amy,

I'm afraid the figures aren't rendering.

On Fri, Jul 30, 2021 at 5:14 PM amybug @.***> wrote:

Dear Jeff, Thank you so much for the quickness of your reply. You must be super busy responding to users. Hopefully this is a "good" busy - indicative of authoring such a useful body of work!

Duh ... ofc Figure 3! I deleted all data with x > xmax and refit, not specifying an xmax. The fitted xmin and alpha did not change, but fit.power_law.plot_ccdf() produced a straight line. Cool!

One more question, please? I attach the new image. The line (red dashed) is just a bit troubling. Is this really a best fit line given the constraint that cCDF(xmin) = 1.0? I "freehanded" in a yellow dashed line. If I were a ftting routine, I would probably come up with the yellow dashed line instead. I asked Matlab for a good old linear least squares fit and it agreed (attached).

Can you tell us why the red dashed line, which has a slightly steeper slope (alpha = 3.98 from powerlaw package vs. alpha = 3.66 from Matlab) is preferred by your "goodness of fit" criteria?

Thank you so very, very much!

Take care, Amy

[image: Screen Shot 2021-07-30 at 4.37.20 PM.png] [image: Screen Shot 2021-07-30 at 5.01.54 PM.png]

On Thu, Jul 29, 2021 at 6:16 PM Jeff Alstott @.***> wrote:

It's a function of having an xmax. Look to Figure 3 in the paper. Thanks for using powerlaw!

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890157239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7LYGIUGMMIOZLKKL6PDT2MI4FANCNFSM5BHBPSQA .

amybug commented 3 years ago

Hi Jeff, Aargh I was afraid of that. I attach the images.
First image - Cyan: cCDF from file of experimental data Red dashed line: fit from powerlaw package Yellow dashed line: Amy's freehand of what she thinks liine should be ... given distribution is normalized so y(xmin) = 1

Second image - What Matlab thinks line should be ... given distribution is normalized so y(xmin) = 1

I'll also attach the experimental data in case it helps. Not that long a data file ... and not very well "resolved". We can take more data with finer resolution if necessary. Just exploring for now.

Many, many thanks for your input!

 Take care,
 Amy
Screen Shot 2021-07-31 at 8 30 51 AM Screen Shot 2021-07-31 at 8 31 01 AM

July28LagTimes.txt

jeffalstott commented 3 years ago

Thanks, Amy. It looks like your underlying data is actually discrete, not continuous. That is, while the reported lag times look like real numbers, they can only exist in discrete multiples of some small number. This means there is tons of data between those points that the continuous power law fit is expecting to see and not seeing . I would multiply the data by its smallest value, then refit with 'discrete=True'.

On Sat, Jul 31, 2021 at 8:38 AM amybug @.***> wrote:

Hi Jeff, Aargh I was afraid of that. I attach the images. First image - Cyan: cCDF from file of experimental data Red dashed line: fit from powerlaw package Yellow dashed line: Amy's freehand of what she thinks liine should be ... given distribution is normalized so y(xmin) = 1

Second image - What Matlab thinks line should be ... given distribution is normalized so y(xmin) = 1

I'll also attach the experimental data in case it helps. Not that long a data file ... and not very well "resolved". We can take more data with finer resolution if necessary. Just exploring for now.

Many, many thanks for your input!

Take care, Amy

[image: Screen Shot 2021-07-31 at 8 30 51 AM] https://user-images.githubusercontent.com/4631628/127739941-67972464-b21a-4f85-9544-a40a5e26fa83.png

[image: Screen Shot 2021-07-31 at 8 31 01 AM] https://user-images.githubusercontent.com/4631628/127739945-3917dc20-42c8-41ce-afee-5a2521e05d38.png July28LagTimes.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890342314, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7L3QS65XX756473VWW3T2PVEDANCNFSM5BHBPSQA .

amybug commented 3 years ago

Hi Jeff, Once again, we are grateful to you. That was the magic!

The dashed line now fits the first few data points. It does not fit the last few data points which curve downward. This is very typical of these types of experiments. These are lag times for passage through a narrow doorway (animals, active matter, hopper flow).

FYI I did this in python: divisor = min(all_lags_cyan_truncated) all_lags_cyan_truncated[:] = [x / divisor for x in all_lags_cyan_truncated] #divide by the min results_cyan_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True)

and also this: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True)

We then got a slightly different alpha of 3.13 (old value was 3.98) . We got an xmin equal to the old xmin/divisor. We got the attached fit. Hooray!

Take care ... have a great Saturday,
  Amy
Screen Shot 2021-07-31 at 12 54 41 PM
jeffalstott commented 3 years ago

Congrats!

"It does not fit the last few data points which curve downward. This is very typical of these types of experiments." FYI, since you have a true xmax in the empirical data that may be forcing the CCDF to curve downward. Without putting the known xmax into the fit, though, 'powerlaw' won't know about it. I'd suggest trying fitting with the known xmax and you'd likely get a better fit, as you saw with the curve in the fit you originally asked about.

Note: your empirical has a "true" xmax because you forced it by truncating the raw data. The physics of the real system you're studying (the lag times) may or may not have a true xmax, but you know that best. The issue of whether or not xmaxes exist or not and how to treat them is an issue with describing power laws, since that tail is also what we would be looking at to distinguish a power law from an exponentially truncated power law. This is extra relevant in a situation like this, where the observed scaling is barely more than 1 order of magnitude, with a curving tail nearly equally long. This is where understanding the mechanisms behind the data is critical, which unfortunately 'powerlaw' can't help with!

On Sat, Jul 31, 2021 at 1:06 PM amybug @.***> wrote:

Hi Jeff, Once again, we are grateful to you. That was the magic!

The dashed line now fits the first few data points. It does not fit the last few data points which curve downward. This is very typical of these types of experiments. These are lag times for passage through a narrow doorway (animals, active matter, hopper flow).

FYI I did this in python: divisor = min(all_lags_cyan_truncated) all_lags_cyan_truncated[:] = [x / divisor for x in all_lags_cyan_truncated] #divide by the min results_cyan_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True)

and also this: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True)

We then got a slightly different alpha of 3.13 (old value was 3.98) . We got an xmin equal to the old xmin/divisor. We got the attached fit. Hooray!

Take care ... have a great Saturday, Amy

[image: Screen Shot 2021-07-31 at 12 54 41 PM] https://user-images.githubusercontent.com/4631628/127747166-a6987a1a-4808-41f1-a47a-288253feb878.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890376088, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7L4BOQTPIP2X24NVYCLT2QUQRANCNFSM5BHBPSQA .

amybug commented 3 years ago

Hi Jeff, I love the new fit, whose alpha is not significantly different and which curves with the data.

Taking the advice in your paper around having an xmax, it is better to show folks the line that fits the pdf.
Your excellent advice produced one (hopefully, last?) question: Why does my logic yield 1/4 the number of points on a pdf as on a cCDF?

I'm feeling there is some numerology here:  Our xmax = 0.9 and discretization is 0.025 and this ratio is 4.  36 points are shown on the cCDF but only 9 points on the pdf. 

Again, hope I do not have to bother you much further.  If I do, we need to bring you on to our NSF grant as a consultant :-) 
   Take care,
   Amy

Python code: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax=36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker = 'x' ) fit_truncated.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) fit2=fit_truncated.plot_ccdf(color='cyan', linestyle = 'None', marker = 'o' , ax = fig2) x, y = results_cyan_truncated.ccdf()

plt.plot(x, y, 'o', color='cyan')

fit_truncated.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) Figure: attached

Screen Shot 2021-07-31 at 3 35 02 PM
jeffalstott commented 3 years ago

The cdf/ccdf plots for every unique data value. The pdf plot is a histogram of n bins. The defaults for how those bins are calculated is probably the most defensible for the case where you don't know anything about the physics of the underlying data, but you could crack it open and go down a rabbit hole. This is part of why cdf/ccdfs are so much nicer for making visual statements about what is going on; there is no possibility of using bins to obscure what is going on.

On Sat, Jul 31, 2021 at 3:37 PM amybug @.***> wrote:

Hi Jeff, I love the new fit, whose alpha is not significantly different and which curves with the data.

Taking the advice in your paper around having an xmax, it is better to show folks the line that fits the pdf. Your excellent advice produced one (hopefully, last?) question: Why does my logic yield 1/4 the number of points on a pdf as on a cCDF?

I'm feeling there is some numerology here: Our xmax = 0.9 and discretization is 0.025 and this ratio is 4. 36 points are shown on the cCDF but only 9 points on the pdf.

Again, hope I do not have to bother you much further. If I do, we need to bring you on to our NSF grant as a consultant :-) Take care, Amy

Python code: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax=36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker = 'x' ) fit_truncated.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) fit2=fit_truncated.plot_ccdf(color='cyan', linestyle = 'None', marker = 'o' , ax = fig2) x, y = results_cyan_truncated.ccdf()

plt.plot(x, y, 'o', color='cyan')

fit_truncated.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) Figure: attached [image: Screen Shot 2021-07-31 at 3 35 02 PM] https://user-images.githubusercontent.com/4631628/127750643-782e1f2b-b5a3-473a-a325-d9baa1a14943.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890395519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7LY37KXGRUIVFHEMDJTT2RGGJANCNFSM5BHBPSQA .

amybug commented 3 years ago

Aha! Thank you, Jeff!

No wish to go down a rabbit hole, but may I play with the bin widths for the pdf histogram? If yes, keyword is ... ?

We can perhaps be systematic (Freedman-Draconis?) twhen we start analyzing data from production runs. Or maybe you will tell me that F-D or a similar criterion was already used to choose the default number of bins :-)

Take care,
 Amy

On Sat, Jul 31, 2021 at 3:43 PM Jeff Alstott @.***> wrote:

The cdf/ccdf plots for every unique data value. The pdf plot is a histogram of n bins. The defaults for how those bins are calculated is probably the most defensible for the case where you don't know anything about the physics of the underlying data, but you could crack it open and go down a rabbit hole. This is part of why cdf/ccdfs are so much nicer for making visual statements about what is going on; there is no possibility of using bins to obscure what is going on.

On Sat, Jul 31, 2021 at 3:37 PM amybug @.***> wrote:

Hi Jeff, I love the new fit, whose alpha is not significantly different and which curves with the data.

Taking the advice in your paper around having an xmax, it is better to show folks the line that fits the pdf. Your excellent advice produced one (hopefully, last?) question: Why does my logic yield 1/4 the number of points on a pdf as on a cCDF?

I'm feeling there is some numerology here: Our xmax = 0.9 and discretization is 0.025 and this ratio is 4. 36 points are shown on the cCDF but only 9 points on the pdf.

Again, hope I do not have to bother you much further. If I do, we need to bring you on to our NSF grant as a consultant :-) Take care, Amy

Python code: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax=36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker = 'x' ) fit_truncated.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) fit2=fit_truncated.plot_ccdf(color='cyan', linestyle = 'None', marker = 'o' , ax = fig2) x, y = results_cyan_truncated.ccdf()

plt.plot(x, y, 'o', color='cyan')

fit_truncated.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) Figure: attached [image: Screen Shot 2021-07-31 at 3 35 02 PM] < https://user-images.githubusercontent.com/4631628/127750643-782e1f2b-b5a3-473a-a325-d9baa1a14943.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890395519>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAHL7LY37KXGRUIVFHEMDJTT2RGGJANCNFSM5BHBPSQA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890396289, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDKYTBFX2YPLDMEYDL7ICTT2RG5FANCNFSM5BHBPSQA .

--

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

jeffalstott commented 3 years ago

If you're just looking to plot the PDF, which is a histogram, that can be done with matplotlib/etc. directly.

'plot_pdf' in 'powerlaw' will take an explicit 'bins' argument just like matplotlib's histogram functions. By default 'powerlaw' calculates logarithmically-spaced bins, but you can use 'linear_bins=True' as well. The relevant portion of the code is here: https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1971

On Sat, Jul 31, 2021 at 4:01 PM amybug @.***> wrote:

Aha! Thank you, Jeff!

No wish to go down a rabbit hole, but may I play with the bin widths for the pdf histogram? If yes, keyword is ... ?

We can perhaps be systematic (Freedman-Draconis?) twhen we start analyzing data from production runs. Or maybe you will tell me that F-D or a similar criterion was already used to choose the default number of bins :-)

Take care, Amy

On Sat, Jul 31, 2021 at 3:43 PM Jeff Alstott @.***> wrote:

The cdf/ccdf plots for every unique data value. The pdf plot is a histogram of n bins. The defaults for how those bins are calculated is probably the most defensible for the case where you don't know anything about the physics of the underlying data, but you could crack it open and go down a rabbit hole. This is part of why cdf/ccdfs are so much nicer for making visual statements about what is going on; there is no possibility of using bins to obscure what is going on.

On Sat, Jul 31, 2021 at 3:37 PM amybug @.***> wrote:

Hi Jeff, I love the new fit, whose alpha is not significantly different and which curves with the data.

Taking the advice in your paper around having an xmax, it is better to show folks the line that fits the pdf. Your excellent advice produced one (hopefully, last?) question: Why does my logic yield 1/4 the number of points on a pdf as on a cCDF?

I'm feeling there is some numerology here: Our xmax = 0.9 and discretization is 0.025 and this ratio is 4. 36 points are shown on the cCDF but only 9 points on the pdf.

Again, hope I do not have to bother you much further. If I do, we need to bring you on to our NSF grant as a consultant :-) Take care, Amy

Python code: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax=36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker = 'x' ) fit_truncated.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) fit2=fit_truncated.plot_ccdf(color='cyan', linestyle = 'None', marker = 'o' , ax = fig2) x, y = results_cyan_truncated.ccdf()

plt.plot(x, y, 'o', color='cyan')

fit_truncated.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) Figure: attached [image: Screen Shot 2021-07-31 at 3 35 02 PM] <

https://user-images.githubusercontent.com/4631628/127750643-782e1f2b-b5a3-473a-a325-d9baa1a14943.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890395519 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAHL7LY37KXGRUIVFHEMDJTT2RGGJANCNFSM5BHBPSQA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890396289>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABDKYTBFX2YPLDMEYDL7ICTT2RG5FANCNFSM5BHBPSQA

.

--

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890398075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7L44MBRTTQV4LL5UD5DT2RJBXANCNFSM5BHBPSQA .

amybug commented 3 years ago

Thank you so much, Jeff!

I need to nerd on this a little more, b/c right now python is throwing an AttributeError when I do this fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax = 36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker = 'o', bins=20) telling me that 'Line2D' object has no property 'bins'

But I can see the option: if 'bins' in kwargs.keys(): right there in the definition of the pdf object in the source code you sent me. Hmmm!

I can certainly plot the pdf and the fitted line myself on a subplot using matplotlib. It is just so very convenient to do it all with powerlaw.Fit objects!

  Take care,
  Amy

On Sat, Jul 31, 2021 at 4:11 PM Jeff Alstott @.***> wrote:

If you're just looking to plot the PDF, which is a histogram, that can be done with matplotlib/etc. directly.

'plot_pdf' in 'powerlaw' will take an explicit 'bins' argument just like matplotlib's histogram functions. By default 'powerlaw' calculates logarithmically-spaced bins, but you can use 'linear_bins=True' as well. The relevant portion of the code is here: https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1971

On Sat, Jul 31, 2021 at 4:01 PM amybug @.***> wrote:

Aha! Thank you, Jeff!

No wish to go down a rabbit hole, but may I play with the bin widths for the pdf histogram? If yes, keyword is ... ?

We can perhaps be systematic (Freedman-Draconis?) twhen we start analyzing data from production runs. Or maybe you will tell me that F-D or a similar criterion was already used to choose the default number of bins :-)

Take care, Amy

On Sat, Jul 31, 2021 at 3:43 PM Jeff Alstott @.***> wrote:

The cdf/ccdf plots for every unique data value. The pdf plot is a histogram of n bins. The defaults for how those bins are calculated is probably the most defensible for the case where you don't know anything about the physics of the underlying data, but you could crack it open and go down a rabbit hole. This is part of why cdf/ccdfs are so much nicer for making visual statements about what is going on; there is no possibility of using bins to obscure what is going on.

On Sat, Jul 31, 2021 at 3:37 PM amybug @.***> wrote:

Hi Jeff, I love the new fit, whose alpha is not significantly different and which curves with the data.

Taking the advice in your paper around having an xmax, it is better to show folks the line that fits the pdf. Your excellent advice produced one (hopefully, last?) question: Why does my logic yield 1/4 the number of points on a pdf as on a cCDF?

I'm feeling there is some numerology here: Our xmax = 0.9 and discretization is 0.025 and this ratio is 4. 36 points are shown on the cCDF but only 9 points on the pdf.

Again, hope I do not have to bother you much further. If I do, we need to bring you on to our NSF grant as a consultant :-) Take care, Amy

Python code: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax=36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker

'x' ) fit_truncated.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) fit2=fit_truncated.plot_ccdf(color='cyan', linestyle = 'None', marker = 'o' , ax = fig2) x, y = results_cyan_truncated.ccdf()

plt.plot(x, y, 'o', color='cyan')

fit_truncated.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) Figure: attached [image: Screen Shot 2021-07-31 at 3 35 02 PM] <

https://user-images.githubusercontent.com/4631628/127750643-782e1f2b-b5a3-473a-a325-d9baa1a14943.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890395519 ,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAHL7LY37KXGRUIVFHEMDJTT2RGGJANCNFSM5BHBPSQA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890396289 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABDKYTBFX2YPLDMEYDL7ICTT2RG5FANCNFSM5BHBPSQA

.

--

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890398075>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAHL7L44MBRTTQV4LL5UD5DT2RJBXANCNFSM5BHBPSQA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890399135, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDKYTB7S6AJNWXVOTXKPHDT2RKINANCNFSM5BHBPSQA .

--

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

jeffalstott commented 3 years ago

'bins' is a list of bin edges, just like in the matplotlib functions (which is what 'powerlaw' is calling)

On Sat, Jul 31, 2021 at 5:19 PM amybug @.***> wrote:

Thank you so much, Jeff!

I need to nerd on this a little more, b/c right now python is throwing an AttributeError when I do this fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax = 36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker = 'o', bins=20) telling me that 'Line2D' object has no property 'bins'

But I can see the option: if 'bins' in kwargs.keys(): right there in the definition of the pdf object in the source code you sent me. Hmmm!

I can certainly plot the pdf and the fitted line myself on a subplot using matplotlib. It is just so very convenient to do it all with powerlaw.Fit objects!

Take care, Amy

On Sat, Jul 31, 2021 at 4:11 PM Jeff Alstott @.***> wrote:

If you're just looking to plot the PDF, which is a histogram, that can be done with matplotlib/etc. directly.

'plot_pdf' in 'powerlaw' will take an explicit 'bins' argument just like matplotlib's histogram functions. By default 'powerlaw' calculates logarithmically-spaced bins, but you can use 'linear_bins=True' as well. The relevant portion of the code is here: https://github.com/jeffalstott/powerlaw/blob/master/powerlaw.py#L1971

On Sat, Jul 31, 2021 at 4:01 PM amybug @.***> wrote:

Aha! Thank you, Jeff!

No wish to go down a rabbit hole, but may I play with the bin widths for the pdf histogram? If yes, keyword is ... ?

We can perhaps be systematic (Freedman-Draconis?) twhen we start analyzing data from production runs. Or maybe you will tell me that F-D or a similar criterion was already used to choose the default number of bins :-)

Take care, Amy

On Sat, Jul 31, 2021 at 3:43 PM Jeff Alstott @.***> wrote:

The cdf/ccdf plots for every unique data value. The pdf plot is a histogram of n bins. The defaults for how those bins are calculated is probably the most defensible for the case where you don't know anything about the physics of the underlying data, but you could crack it open and go down a rabbit hole. This is part of why cdf/ccdfs are so much nicer for making visual statements about what is going on; there is no possibility of using bins to obscure what is going on.

On Sat, Jul 31, 2021 at 3:37 PM amybug @.***> wrote:

Hi Jeff, I love the new fit, whose alpha is not significantly different and which curves with the data.

Taking the advice in your paper around having an xmax, it is better to show folks the line that fits the pdf. Your excellent advice produced one (hopefully, last?) question: Why does my logic yield 1/4 the number of points on a pdf as on a cCDF?

I'm feeling there is some numerology here: Our xmax = 0.9 and discretization is 0.025 and this ratio is 4. 36 points are shown on the cCDF but only 9 points on the pdf.

Again, hope I do not have to bother you much further. If I do, we need to bring you on to our NSF grant as a consultant :-) Take care, Amy

Python code: fit_truncated = powerlaw.Fit(all_lags_cyan_truncated, discrete = True, xmax=36) fig2 = fit_truncated.plot_pdf(color= 'k', linestyle = 'None', marker

'x' ) fit_truncated.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) fit2=fit_truncated.plot_ccdf(color='cyan', linestyle = 'None', marker = 'o' , ax = fig2) x, y = results_cyan_truncated.ccdf()

plt.plot(x, y, 'o', color='cyan')

fit_truncated.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) Figure: attached [image: Screen Shot 2021-07-31 at 3 35 02 PM] <

https://user-images.githubusercontent.com/4631628/127750643-782e1f2b-b5a3-473a-a325-d9baa1a14943.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890395519 ,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAHL7LY37KXGRUIVFHEMDJTT2RGGJANCNFSM5BHBPSQA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890396289 ,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABDKYTBFX2YPLDMEYDL7ICTT2RG5FANCNFSM5BHBPSQA

.

--

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890398075 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAHL7L44MBRTTQV4LL5UD5DT2RJBXANCNFSM5BHBPSQA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890399135>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABDKYTB7S6AJNWXVOTXKPHDT2RKINANCNFSM5BHBPSQA

.

--

Amy Graves (formerly Amy Bug) Fellow of the American Physical Society Walter Kemp Professor in the Natural Sciences Dept. of Physics and Astronomy, Swarthmore College

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeffalstott/powerlaw/issues/85#issuecomment-890405052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL7L4B6AZIGVKT5WHROM3T2RSHBANCNFSM5BHBPSQA .

amybug commented 3 years ago

Hey Jeff,  I want to thank you for all the time you've spent answering my emails! I am really just faking it  at being python-knowledgeable. (It's a professor thing ... we can only cover so much ground.)

i) Sadly, I couldn't implement your latest suggestion.  I tried placing a data structure like this bins=[1, 1.5, 2, 2.5, 3, 3.5, 6, 7, 8, 9, 10, 20, 30, 40] in the arguments in line 1. or line 2. below.  I either got AttributeErrors or got no effect. 

  1. fit = powerlaw.Fit(all_lags_cyan, xmax=100)
  2. fig2 = fit.plot_pdf(color= 'k', linestyle = 'None', marker = 'o' )

ii) Happily, with more abundant data, the binning choice that powerlaw makes is just fine.  I tested with a synthetic dataset of 20,000 points (uniform on [0,1] and then power law for x > 1). It worked very well in all ways, including plots of PDF and cCDF

iii) Happily too, I was able to specify bin positions with good old matplotlib, and have the data appear on the same figure as other data.   The black X's (attached) are what I was shooting for. (I totally hacked the number of values to bin and plot b/c there is one more bin edge than bin value ... of course ;-) 

fig2 = fit.plot_pdf(color= 'k', linestyle = 'None', marker = 'o', markersize = 3 ) fit.power_law.plot_pdf(color='b', linestyle = '--', ax=fig2) x, y = fit.ccdf()plt.plot(x, y, 'o', color='cyan') fit.power_law.plot_ccdf(color='r', linestyle = '--', ax=fig2) x1, y1 = fit.pdf(bins=[1, 1.5, 2, 2.5, 3, 3.5, 6, 7, 8, 9, 10, 20, 30, 40]) x1 = x1[0:13] y1 = y1[0:13] plt.plot(x1,y1,'x', color='k', markersize = 10)

  Again, we are extremely grateful; your quick and comprehensive responses have been amazing! In whatever we write up, we will warmly acknowledge your expert help.       Take care,       Amy        

Screen Shot 2021-08-01 at 11 37 58 AM

On Sat, Jul 31, 2021 at 5:45 PM Jeff Alstott notifications@github.com wrote: 'bins' is a list of bin edges, just like in the matplotlib functions (which is what 'powerlaw' is calling)