llogiq / newlinebench

MIT License
22 stars 2 forks source link

hyperscreaming #3

Closed Veedrac closed 8 years ago

Veedrac commented 8 years ago

This code is sufficiently evil that chances are there's something horribly wrong with it, but it is fast.

With target-cpu=native:

slow            somenewlines         29 ns/iter (+/- 1)         322%
slow            nonewlines        4,042 ns/iter (+/- 285)      1050%
slow            newlines          5,738 ns/iter (+/- 362)      1490%
slow            random           63,424 ns/iter (+/- 17,382)   1093%

fast            somenewlines         14 ns/iter (+/- 0)         156%
fast            nonewlines        1,848 ns/iter (+/- 114)       480%
fast            newlines          1,886 ns/iter (+/- 106)       490%
fast            random           27,614 ns/iter (+/- 1,642)     476%

screaming       somenewlines          8 ns/iter (+/- 0)          89%
screaming       nonewlines          946 ns/iter (+/- 70)        246%
screaming       newlines            950 ns/iter (+/- 34)        247%
screaming       random           14,076 ns/iter (+/- 880)       243%

faster          somenewlines         11 ns/iter (+/- 0)         122%
faster          nonewlines          882 ns/iter (+/- 347)       229%
faster          newlines            837 ns/iter (+/- 54)        217%
faster          random           12,481 ns/iter (+/- 701)       215%

fastest         somenewlines          6 ns/iter (+/- 0)          67%
fastest         nonewlines          786 ns/iter (+/- 46)        204%
fastest         newlines            786 ns/iter (+/- 48)        204%
fastest         random           12,130 ns/iter (+/- 754)       209%

hyperscreaming  somenewlines          9 ns/iter (+/- 1)         100%
hyperscreaming  nonewlines          385 ns/iter (+/- 32)        100%
hyperscreaming  newlines            385 ns/iter (+/- 40)        100%
hyperscreaming  random            5,802 ns/iter (+/- 326)       100%

Without target-cpu=native:

slow            somenewlines         27 ns/iter (+/- 1)         300%
slow            nonewlines        3,137 ns/iter (+/- 81)        500%
slow            newlines          5,738 ns/iter (+/- 595)       939%
slow            random           63,574 ns/iter (+/- 3,765)     662%

faster          somenewlines         17 ns/iter (+/- 1)         189%
faster          nonewlines        2,462 ns/iter (+/- 153)       392%
faster          newlines          2,463 ns/iter (+/- 176)       403%
faster          random           36,772 ns/iter (+/- 1,864)     383%

fast            somenewlines         14 ns/iter (+/- 1)         156%
fast            nonewlines        1,827 ns/iter (+/- 375)       291%
fast            newlines          1,883 ns/iter (+/- 492)       308%
fast            random           26,978 ns/iter (+/- 1,747)     281%

fastest         somenewlines         11 ns/iter (+/- 0)         122%
fastest         nonewlines        1,093 ns/iter (+/- 61)        174%
fastest         newlines          1,063 ns/iter (+/- 65)        174%
fastest         random           16,089 ns/iter (+/- 855)       168%

screaming       somenewlines          8 ns/iter (+/- 0)          89%
screaming       nonewlines          946 ns/iter (+/- 57)        151%
screaming       newlines            946 ns/iter (+/- 64)        155%
screaming       random           14,104 ns/iter (+/- 793)       147%

hyperscreaming  somenewlines          9 ns/iter (+/- 0)         100%
hyperscreaming  nonewlines          628 ns/iter (+/- 37)        100%
hyperscreaming  newlines            611 ns/iter (+/- 50)        100%
hyperscreaming  random            9,599 ns/iter (+/- 764)       100%

LLVM never vectorized well, but given there's an implicit usize of vectorization I don't suppose it matters too much. I could probably make it a fair bit faster if I wrote vector code directly, but that'd be even more fragile.

llogiq commented 8 years ago

Thanks! You never disappoint :smile:

I think I've understood your implementation; evil aside, it should work exceedingly well.

Just for the record, can you tell me your CPU model?

Veedrac commented 8 years ago

I'm running an i7-6700HQ.