Closed jyscao closed 4 years ago
Thanks for the comment; this is interesting and I don't quite know how to explain it yet. I was suspecting that there is some overhead associated to converting a Python list to an array-of-floats in the C layer, but this cannot account for such a large difference.
I added some printf()
statements in plfit_continuous()
in the C layer to see what the "real" difference is in terms of execution time. I get 0.21s on my system from the CLI and 2.08s from Python so there is still a ~10x slowdown.
Okay, it turns out you also need this line:
opts.xmin_method = plfit.PLFIT_STRATIFIED_SAMPLING
The CLI uses stratified sampling to "guesstimate" xmin
before switching to a linear scan on a narrowed interval, while the default options that you create in Python use an exhaustive linear scan.
This is still a bug, though, because plfit_continuous_options_t()
is supposed to set up the defaults properly.
Fixed in 654ab50; the defaults are now set up correctly. I'll publish a patch release soon. Thanks a lot!
Hi there, firstly thanks for creating this super performant implementation of the Clauset power-law fitting method.
So following the README instructions, I built the executables as well as the Python module. Using the test dataset
data/continuous_data.txt
, the compiled executable produced the fitting in about 0.5 sec on my system (timed using thetime
command); while using the SWIG generated Python wrapper on the same input through an IPython REPL gave the following:The fitted parameters are exactly the same (alpha=2.53282, xmin=1.43628), so that can't be a source of the performance differences, which at least to me is unexpectedly large.
Not sure if I did something wrong during building or in terms of usage. Any ideas?