jlongster / transducers.js

A small library for generalized transformation of data (inspired by Clojure's transducers)
BSD 2-Clause "Simplified" License
1.72k stars 54 forks source link

slightly more comprehensive benchmarks (for take and bench) #11

Closed stefanpenner closed 9 years ago

stefanpenner commented 9 years ago

someone should review as benchmarks are always mega trolling, but hand-rolled isn't that much faster then transducers. (in this case)

output

npm version                                                                                                                                                                                      !10021
{ http_parser: '2.3',
  node: '0.11.14',
  v8: '3.26.33',
  uv: '1.0.0',
  zlib: '1.2.3',
  modules: '14',
  openssl: '1.0.1i',
  npm: '2.0.0',
  'transducers.js': '0.2.0' }

lodash master (as it has lazy support that isn't released yet)

 (n=1) hand-rolled baseline x 3,104,227 ops/sec ±5.34% (79 runs sampled)
 (n=1) native x 504,005 ops/sec ±7.77% (70 runs sampled)
 (n=1) _.map/filter x 362,008 ops/sec ±10.59% (70 runs sampled)
 (n=1) t.map/filter+transduce x 1,022,069 ops/sec ±7.60% (74 runs sampled)

 (n=2) hand-rolled baseline x 3,202,279 ops/sec ±4.14% (85 runs sampled)
 (n=2) native x 505,865 ops/sec ±4.54% (76 runs sampled)
 (n=2) _.map/filter x 415,780 ops/sec ±4.01% (80 runs sampled)
 (n=2) t.map/filter+transduce x 1,114,734 ops/sec ±3.74% (82 runs sampled)

 (n=10) hand-rolled baseline x 2,642,734 ops/sec ±5.40% (79 runs sampled)
 (n=10) native x 296,594 ops/sec ±6.06% (71 runs sampled)
 (n=10) _.map/filter x 332,943 ops/sec ±6.15% (76 runs sampled)
 (n=10) t.map/filter+transduce x 969,149 ops/sec ±7.39% (76 runs sampled)

 (n=50) hand-rolled baseline x 816,477 ops/sec ±4.43% (82 runs sampled)
 (n=50) native x 99,406 ops/sec ±4.96% (79 runs sampled)
 (n=50) _.map/filter x 218,778 ops/sec ±3.68% (85 runs sampled)
 (n=50) t.map/filter+transduce x 671,534 ops/sec ±3.53% (84 runs sampled)

 (n=100) hand-rolled baseline x 572,488 ops/sec ±4.45% (81 runs sampled)
 (n=100) native x 56,239 ops/sec ±3.62% (79 runs sampled)
 (n=100) _.map/filter x 139,856 ops/sec ±5.43% (78 runs sampled)
 (n=100) t.map/filter+transduce x 426,086 ops/sec ±4.14% (81 runs sampled)

 (n=1000) hand-rolled baseline x 564,258 ops/sec ±3.82% (83 runs sampled)
 (n=1000) native x 6,173 ops/sec ±4.23% (80 runs sampled)
 (n=1000) _.map/filter x 143,803 ops/sec ±3.56% (82 runs sampled)
 (n=1000) t.map/filter+transduce x 408,646 ops/sec ±3.56% (79 runs sampled)

 (n=10000) hand-rolled baseline x 569,765 ops/sec ±3.23% (80 runs sampled)
 (n=10000) native x 641 ops/sec ±3.85% (80 runs sampled)
 (n=10000) _.map/filter x 141,814 ops/sec ±3.71% (84 runs sampled)
 (n=10000) t.map/filter+transduce x 423,888 ops/sec ±3.85% (82 runs sampled)

 (n=100000) hand-rolled baseline x 2,196,801 ops/sec ±3.47% (79 runs sampled)
 (n=100000) native x 47.06 ops/sec ±4.48% (51 runs sampled)
 (n=100000) _.map/filter x 136,975 ops/sec ±4.23% (80 runs sampled)
 (n=100000) t.map/filter+transduce x 410,112 ops/sec ±4.12% (81 runs sampled)
stefanpenner commented 9 years ago

(n=100000) hand-rolled baseline x 2,196,801 ops/sec ±3.47% (79 runs sampled) (n=100000) native x 47.06 ops/sec ±4.48% (51 runs sampled) (n=100000) _.map/filter x 136,975 ops/sec ±4.23% (80 runs sampled) (n=100000) t.map/filter+transduce x 410,112 ops/sec ±4.12% (81 runs sampled)

confuses me...

jlongster commented 9 years ago

Not sure what you mean by "hand-rolled isn't that much faster"? Looking at the largest array, looks like it's quite a bit faster. I also think these benchmarks are only appropriate when the size of the array is at least 1000 or so, otherwise you're probably in the territory of testing tiny specific details of how the JIT works.

stefanpenner commented 9 years ago

(n=1000) hand-rolled baseline x 564,258 ops/sec ±3.82% (83 runs sampled) (n=1000) native x 6,173 ops/sec ±4.23% (80 runs sampled) (n=1000) _.map/filter x 143,803 ops/sec ±3.56% (82 runs sampled) (n=1000) t.map/filter+transduce x 408,646 ops/sec ±3.56% (79 runs sampled)

(n=10000) hand-rolled baseline x 569,765 ops/sec ±3.23% (80 runs sampled) (n=10000) native x 641 ops/sec ±3.85% (80 runs sampled) (n=10000) _.map/filter x 141,814 ops/sec ±3.71% (84 runs sampled) (n=10000) t.map/filter+transduce x 423,888 ops/sec ±3.85% (82 runs sampled)

some of the intermediate tests don't have the ratio I expected. Likely warrants further investigation.

jlongster commented 9 years ago

Oh, now I see what you mean in the other comment above. Transducers are pretty close all up until the largest array. That is good; the way transducers are structured should make it easy for the engine to inline almost all of the work (it's just a few tiny function calls per item). At least that's why I assume it performs so well.

Not sure about that largest array case though, and why hand-rolled jumps up so much.

Agreed about benchmarks as mega-trolling, but it is good to demonstrate if done correctly. Insights gained should be roughly interpreted at best.

jlongster commented 9 years ago

some of the intermediate tests don't have the ratio I expected. Likely warrants further investigation.

I would guess that those are the correct ones, and we need to look into why hand-rolled jumps up so much with the largest array. These should be highly optimized by the engine, so not surprised if they perform so well... but yeah, I'll look into all of this more.

stefanpenner commented 9 years ago

I also think these benchmarks are only appropriate when the size of the array is at least 1000 or so, otherwise you're probably in the territory of testing tiny specific details of how the JIT works.

Maybe, but many app scenarios are compositions of many different array compositions processing small N. So although I wouldn't want to sacrifice large N throughput, I would still minimize overhead for small N

stefanpenner commented 9 years ago

@jlongster thanks :)

jlongster commented 9 years ago

@stefanpenner np! this is the rendered graph of that benchmark now: http://jlongster.com/s/trans-bench2.png. Working on running it in other engines than node (transducers do win the most in v8)