nferraz / st

simple statistics from the command line
MIT License
924 stars 69 forks source link

Slow calculations #12

Closed austvik closed 11 years ago

austvik commented 11 years ago

Hi,

thanks for a great tool - st is exactly what I need, except for the speed.

Up to 1000 lines of numbers in a file works ok:

$ time head -n 10 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 9.00 7578.00 19843.00 132073.00 14674.78 3632.56

real 0m0.207s user 0m0.170s sys 0m0.010s

$ time head -n 100 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 99.00 7578.00 35999.00 2372769.00 23967.36 5713.40

real 0m0.339s user 0m0.300s sys 0m0.020s

$ time head -n 1000 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 999.00 80.00 38075.00 7644960.00 7652.61 10007.16

real 0m2.375s user 0m2.280s sys 0m0.030s

But at 10.000 lines it starts getting really slow:

$ time head -n 10000 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 9999.00 40.00 38075.00 11624304.00 1162.55 3934.22

real 0m26.478s user 0m24.600s sys 0m0.070s

I don't know why it takes so long time, perl can do it pretty quickly:

$ time head -n 10000 jmetersaso.log | cut -d, -f2 | perl -lne '$x += $; END { print $x; }' 11624304

real 0m0.022s user 0m0.010s sys 0m0.000s

Even just the --sum takes 1000 times as long as perl:

$ time head -n 10000 jmeter_saso.log | cut -d, -f2 | st --sum Invalid value 'elapsed' on input line 1 11624304.00

real 0m22.732s user 0m22.520s sys 0m0.020s

My files are 1.000.000 lines long, and have now used 22CPU minutes without being complete.

nferraz commented 11 years ago

Hi,

Thank you for the report.

I profiled the code and found that the bignum module was making the script too slow.

Please download the new version, it should fix the problem.

Nelson

2013/9/23 Jørgen Austvik notifications@github.com

Hi,

thanks for a great tool - st is exactly what I need, except for the speed.

Up to 1000 lines of numbers in a file works ok:

$ time head -n 10 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 9.00 7578.00 19843.00 132073.00 14674.78 3632.56

real 0m0.207s user 0m0.170s sys 0m0.010s

$ time head -n 100 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 99.00 7578.00 35999.00 2372769.00 23967.36 5713.40

real 0m0.339s user 0m0.300s sys 0m0.020s

$ time head -n 1000 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 999.00 80.00 38075.00 7644960.00 7652.61 10007.16

real 0m2.375s user 0m2.280s sys 0m0.030s

But at 10.000 lines it starts getting really slow:

$ time head -n 10000 jmeter_saso.log | cut -d, -f2 | st N min max sum mean stddev 9999.00 40.00 38075.00 11624304.00 1162.55 3934.22

real 0m26.478s user 0m24.600s sys 0m0.070s

I don't know why it takes so long time, perl can do it pretty quickly:

$ time head -n 10000 jmetersaso.log | cut -d, -f2 | perl -lne '$x += $; END { print $x; }' 11624304

real 0m0.022s user 0m0.010s sys 0m0.000s

Even just the --sum takes 1000 times as long as perl:

$ time head -n 10000 jmeter_saso.log | cut -d, -f2 | st --sum Invalid value 'elapsed' on input line 1 11624304.00

real 0m22.732s user 0m22.520s sys 0m0.020s

My files are 1.000.000 lines long, and have now used 22CPU minutes without being complete.

— Reply to this email directly or view it on GitHubhttps://github.com/nferraz/st/issues/12 .

Nelson Ferraz

austvik commented 11 years ago

Wow! That was fixed quickly!

10000 lines went from 9.7 seconds to 0.2 seconds for me. Perfect!

Thank you very much!