Closed Veedrac closed 8 years ago
This code is sufficiently evil that chances are there's something horribly wrong with it, but it is fast.
With target-cpu=native:
target-cpu=native
slow somenewlines 29 ns/iter (+/- 1) 322% slow nonewlines 4,042 ns/iter (+/- 285) 1050% slow newlines 5,738 ns/iter (+/- 362) 1490% slow random 63,424 ns/iter (+/- 17,382) 1093% fast somenewlines 14 ns/iter (+/- 0) 156% fast nonewlines 1,848 ns/iter (+/- 114) 480% fast newlines 1,886 ns/iter (+/- 106) 490% fast random 27,614 ns/iter (+/- 1,642) 476% screaming somenewlines 8 ns/iter (+/- 0) 89% screaming nonewlines 946 ns/iter (+/- 70) 246% screaming newlines 950 ns/iter (+/- 34) 247% screaming random 14,076 ns/iter (+/- 880) 243% faster somenewlines 11 ns/iter (+/- 0) 122% faster nonewlines 882 ns/iter (+/- 347) 229% faster newlines 837 ns/iter (+/- 54) 217% faster random 12,481 ns/iter (+/- 701) 215% fastest somenewlines 6 ns/iter (+/- 0) 67% fastest nonewlines 786 ns/iter (+/- 46) 204% fastest newlines 786 ns/iter (+/- 48) 204% fastest random 12,130 ns/iter (+/- 754) 209% hyperscreaming somenewlines 9 ns/iter (+/- 1) 100% hyperscreaming nonewlines 385 ns/iter (+/- 32) 100% hyperscreaming newlines 385 ns/iter (+/- 40) 100% hyperscreaming random 5,802 ns/iter (+/- 326) 100%
Without target-cpu=native:
slow somenewlines 27 ns/iter (+/- 1) 300% slow nonewlines 3,137 ns/iter (+/- 81) 500% slow newlines 5,738 ns/iter (+/- 595) 939% slow random 63,574 ns/iter (+/- 3,765) 662% faster somenewlines 17 ns/iter (+/- 1) 189% faster nonewlines 2,462 ns/iter (+/- 153) 392% faster newlines 2,463 ns/iter (+/- 176) 403% faster random 36,772 ns/iter (+/- 1,864) 383% fast somenewlines 14 ns/iter (+/- 1) 156% fast nonewlines 1,827 ns/iter (+/- 375) 291% fast newlines 1,883 ns/iter (+/- 492) 308% fast random 26,978 ns/iter (+/- 1,747) 281% fastest somenewlines 11 ns/iter (+/- 0) 122% fastest nonewlines 1,093 ns/iter (+/- 61) 174% fastest newlines 1,063 ns/iter (+/- 65) 174% fastest random 16,089 ns/iter (+/- 855) 168% screaming somenewlines 8 ns/iter (+/- 0) 89% screaming nonewlines 946 ns/iter (+/- 57) 151% screaming newlines 946 ns/iter (+/- 64) 155% screaming random 14,104 ns/iter (+/- 793) 147% hyperscreaming somenewlines 9 ns/iter (+/- 0) 100% hyperscreaming nonewlines 628 ns/iter (+/- 37) 100% hyperscreaming newlines 611 ns/iter (+/- 50) 100% hyperscreaming random 9,599 ns/iter (+/- 764) 100%
LLVM never vectorized well, but given there's an implicit usize of vectorization I don't suppose it matters too much. I could probably make it a fair bit faster if I wrote vector code directly, but that'd be even more fragile.
usize
Thanks! You never disappoint :smile:
I think I've understood your implementation; evil aside, it should work exceedingly well.
Just for the record, can you tell me your CPU model?
I'm running an i7-6700HQ.
This code is sufficiently evil that chances are there's something horribly wrong with it, but it is fast.
With
target-cpu=native
:Without
target-cpu=native
:LLVM never vectorized well, but given there's an implicit
usize
of vectorization I don't suppose it matters too much. I could probably make it a fair bit faster if I wrote vector code directly, but that'd be even more fragile.