llogiq / bytecount

Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Apache License 2.0
225 stars 26 forks source link

num_chars #29

Closed Veedrac closed 6 years ago

Veedrac commented 6 years ago

No tuning has been done whatsoever. I also have only the hastiest, barest of tests.

Without SIMD

test bench_num_chars_00000_hyper       ... bench:           3 ns/iter (+/- 0)
test bench_num_chars_00000_naive       ... bench:           1 ns/iter (+/- 0)
test bench_num_chars_00010_hyper       ... bench:           8 ns/iter (+/- 1)
test bench_num_chars_00010_naive       ... bench:           6 ns/iter (+/- 0)
test bench_num_chars_00020_hyper       ... bench:          12 ns/iter (+/- 1)
test bench_num_chars_00020_naive       ... bench:          10 ns/iter (+/- 1)
test bench_num_chars_00030_hyper       ... bench:          18 ns/iter (+/- 2)
test bench_num_chars_00030_naive       ... bench:          16 ns/iter (+/- 2)
test bench_num_chars_00040_hyper       ... bench:          14 ns/iter (+/- 2)
test bench_num_chars_00040_naive       ... bench:          20 ns/iter (+/- 4)
test bench_num_chars_00050_hyper       ... bench:          16 ns/iter (+/- 1)
test bench_num_chars_00050_naive       ... bench:          25 ns/iter (+/- 3)
test bench_num_chars_00060_hyper       ... bench:          18 ns/iter (+/- 3)
test bench_num_chars_00060_naive       ... bench:          29 ns/iter (+/- 4)
test bench_num_chars_00070_hyper       ... bench:          18 ns/iter (+/- 2)
test bench_num_chars_00070_naive       ... bench:          35 ns/iter (+/- 6)
test bench_num_chars_00080_hyper       ... bench:          16 ns/iter (+/- 1)
test bench_num_chars_00080_naive       ... bench:          39 ns/iter (+/- 5)
test bench_num_chars_00090_hyper       ... bench:          19 ns/iter (+/- 2)
test bench_num_chars_00090_naive       ... bench:          45 ns/iter (+/- 7)
test bench_num_chars_00100_hyper       ... bench:          18 ns/iter (+/- 3)
test bench_num_chars_00100_naive       ... bench:          49 ns/iter (+/- 12)
test bench_num_chars_00120_hyper       ... bench:          18 ns/iter (+/- 2)
test bench_num_chars_00120_naive       ... bench:          58 ns/iter (+/- 11)
test bench_num_chars_00140_hyper       ... bench:          21 ns/iter (+/- 2)
test bench_num_chars_00140_naive       ... bench:          67 ns/iter (+/- 14)
test bench_num_chars_00170_hyper       ... bench:          22 ns/iter (+/- 2)
test bench_num_chars_00170_naive       ... bench:          81 ns/iter (+/- 19)
test bench_num_chars_00210_hyper       ... bench:          24 ns/iter (+/- 4)
test bench_num_chars_00210_naive       ... bench:         101 ns/iter (+/- 13)
test bench_num_chars_00250_hyper       ... bench:          26 ns/iter (+/- 5)
test bench_num_chars_00250_naive       ... bench:         123 ns/iter (+/- 46)
test bench_num_chars_00300_hyper       ... bench:          28 ns/iter (+/- 5)
test bench_num_chars_00300_naive       ... bench:         153 ns/iter (+/- 9)
test bench_num_chars_00400_hyper       ... bench:          31 ns/iter (+/- 6)
test bench_num_chars_00400_naive       ... bench:         199 ns/iter (+/- 30)
test bench_num_chars_00500_hyper       ... bench:          38 ns/iter (+/- 5)
test bench_num_chars_00500_naive       ... bench:         251 ns/iter (+/- 37)
test bench_num_chars_00600_hyper       ... bench:          42 ns/iter (+/- 8)
test bench_num_chars_00600_naive       ... bench:         299 ns/iter (+/- 24)
test bench_num_chars_00700_hyper       ... bench:          47 ns/iter (+/- 7)
test bench_num_chars_00700_naive       ... bench:         349 ns/iter (+/- 48)
test bench_num_chars_00800_hyper       ... bench:          48 ns/iter (+/- 4)
test bench_num_chars_00800_naive       ... bench:         387 ns/iter (+/- 43)
test bench_num_chars_00900_hyper       ... bench:          53 ns/iter (+/- 7)
test bench_num_chars_00900_naive       ... bench:         434 ns/iter (+/- 44)
test bench_num_chars_01000_hyper       ... bench:          58 ns/iter (+/- 7)
test bench_num_chars_01000_naive       ... bench:         487 ns/iter (+/- 63)
test bench_num_chars_01200_hyper       ... bench:          67 ns/iter (+/- 9)
test bench_num_chars_01200_naive       ... bench:         576 ns/iter (+/- 51)
test bench_num_chars_01400_hyper       ... bench:          77 ns/iter (+/- 17)
test bench_num_chars_01400_naive       ... bench:         670 ns/iter (+/- 61)
test bench_num_chars_01700_hyper       ... bench:          91 ns/iter (+/- 11)
test bench_num_chars_01700_naive       ... bench:         818 ns/iter (+/- 137)
test bench_num_chars_02100_hyper       ... bench:         118 ns/iter (+/- 5)
test bench_num_chars_02100_naive       ... bench:       1,001 ns/iter (+/- 183)
test bench_num_chars_02500_hyper       ... bench:         134 ns/iter (+/- 23)
test bench_num_chars_02500_naive       ... bench:       1,174 ns/iter (+/- 141)
test bench_num_chars_03000_hyper       ... bench:         152 ns/iter (+/- 21)
test bench_num_chars_03000_naive       ... bench:       1,427 ns/iter (+/- 209)
test bench_num_chars_04000_hyper       ... bench:         202 ns/iter (+/- 29)
test bench_num_chars_04000_naive       ... bench:       1,904 ns/iter (+/- 219)
test bench_num_chars_05000_hyper       ... bench:         253 ns/iter (+/- 41)
test bench_num_chars_05000_naive       ... bench:       2,383 ns/iter (+/- 343)
test bench_num_chars_06000_hyper       ... bench:         302 ns/iter (+/- 31)
test bench_num_chars_06000_naive       ... bench:       2,853 ns/iter (+/- 267)
test bench_num_chars_07000_hyper       ... bench:         352 ns/iter (+/- 33)
test bench_num_chars_07000_naive       ... bench:       3,316 ns/iter (+/- 280)
test bench_num_chars_08000_hyper       ... bench:         400 ns/iter (+/- 58)
test bench_num_chars_08000_naive       ... bench:       3,802 ns/iter (+/- 548)
test bench_num_chars_09000_hyper       ... bench:         448 ns/iter (+/- 94)
test bench_num_chars_09000_naive       ... bench:       4,281 ns/iter (+/- 870)
test bench_num_chars_10000_hyper       ... bench:         502 ns/iter (+/- 73)
test bench_num_chars_10000_naive       ... bench:       4,757 ns/iter (+/- 833)
test bench_num_chars_12000_hyper       ... bench:         617 ns/iter (+/- 134)
test bench_num_chars_12000_naive       ... bench:       5,895 ns/iter (+/- 1,022)
test bench_num_chars_14000_hyper       ... bench:         692 ns/iter (+/- 82)
test bench_num_chars_14000_naive       ... bench:       6,716 ns/iter (+/- 488)
test bench_num_chars_17000_hyper       ... bench:         838 ns/iter (+/- 69)
test bench_num_chars_17000_naive       ... bench:       8,032 ns/iter (+/- 684)
test bench_num_chars_21000_hyper       ... bench:       1,046 ns/iter (+/- 108)
test bench_num_chars_21000_naive       ... bench:      10,111 ns/iter (+/- 1,036)
test bench_num_chars_25000_hyper       ... bench:       1,225 ns/iter (+/- 105)
test bench_num_chars_25000_naive       ... bench:      11,849 ns/iter (+/- 1,386)
test bench_num_chars_30000_hyper       ... bench:       1,458 ns/iter (+/- 246)
test bench_num_chars_30000_naive       ... bench:      14,432 ns/iter (+/- 1,435)
test bench_num_chars_big_0100000_hyper ... bench:       5,101 ns/iter (+/- 334)
test bench_num_chars_big_0100000_naive ... bench:      47,379 ns/iter (+/- 3,196)
test bench_num_chars_big_1000000_hyper ... bench:      51,260 ns/iter (+/- 6,057)
test bench_num_chars_big_1000000_naive ... bench:     471,713 ns/iter (+/- 110,607)

SIMD

test bench_num_chars_00000_hyper       ... bench:           5 ns/iter (+/- 1)
test bench_num_chars_00000_naive       ... bench:           2 ns/iter (+/- 0)
test bench_num_chars_00010_hyper       ... bench:          11 ns/iter (+/- 3)
test bench_num_chars_00010_naive       ... bench:           8 ns/iter (+/- 2)
test bench_num_chars_00020_hyper       ... bench:          11 ns/iter (+/- 1)
test bench_num_chars_00020_naive       ... bench:           9 ns/iter (+/- 3)
test bench_num_chars_00030_hyper       ... bench:          17 ns/iter (+/- 5)
test bench_num_chars_00030_naive       ... bench:          14 ns/iter (+/- 2)
test bench_num_chars_00040_hyper       ... bench:          19 ns/iter (+/- 2)
test bench_num_chars_00040_naive       ... bench:          14 ns/iter (+/- 3)
test bench_num_chars_00050_hyper       ... bench:          16 ns/iter (+/- 6)
test bench_num_chars_00050_naive       ... bench:          15 ns/iter (+/- 8)
test bench_num_chars_00060_hyper       ... bench:          22 ns/iter (+/- 4)
test bench_num_chars_00060_naive       ... bench:          19 ns/iter (+/- 2)
test bench_num_chars_00070_hyper       ... bench:          19 ns/iter (+/- 2)
test bench_num_chars_00070_naive       ... bench:          18 ns/iter (+/- 1)
test bench_num_chars_00080_hyper       ... bench:          16 ns/iter (+/- 1)
test bench_num_chars_00080_naive       ... bench:          19 ns/iter (+/- 2)
test bench_num_chars_00090_hyper       ... bench:          22 ns/iter (+/- 18)
test bench_num_chars_00090_naive       ... bench:          24 ns/iter (+/- 1)
test bench_num_chars_00100_hyper       ... bench:          19 ns/iter (+/- 3)
test bench_num_chars_00100_naive       ... bench:          23 ns/iter (+/- 3)
test bench_num_chars_00120_hyper       ... bench:          22 ns/iter (+/- 3)
test bench_num_chars_00120_naive       ... bench:          29 ns/iter (+/- 5)
test bench_num_chars_00140_hyper       ... bench:          23 ns/iter (+/- 1)
test bench_num_chars_00140_naive       ... bench:          34 ns/iter (+/- 6)
test bench_num_chars_00170_hyper       ... bench:          23 ns/iter (+/- 4)
test bench_num_chars_00170_naive       ... bench:          39 ns/iter (+/- 4)
test bench_num_chars_00210_hyper       ... bench:          20 ns/iter (+/- 3)
test bench_num_chars_00210_naive       ... bench:          43 ns/iter (+/- 5)
test bench_num_chars_00250_hyper       ... bench:          26 ns/iter (+/- 1)
test bench_num_chars_00250_naive       ... bench:          54 ns/iter (+/- 6)
test bench_num_chars_00300_hyper       ... bench:          28 ns/iter (+/- 3)
test bench_num_chars_00300_naive       ... bench:          64 ns/iter (+/- 12)
test bench_num_chars_00400_hyper       ... bench:          23 ns/iter (+/- 4)
test bench_num_chars_00400_naive       ... bench:          78 ns/iter (+/- 7)
test bench_num_chars_00500_hyper       ... bench:          29 ns/iter (+/- 6)
test bench_num_chars_00500_naive       ... bench:          99 ns/iter (+/- 13)
test bench_num_chars_00600_hyper       ... bench:          33 ns/iter (+/- 5)
test bench_num_chars_00600_naive       ... bench:         120 ns/iter (+/- 14)
test bench_num_chars_00700_hyper       ... bench:          39 ns/iter (+/- 8)
test bench_num_chars_00700_naive       ... bench:         139 ns/iter (+/- 23)
test bench_num_chars_00800_hyper       ... bench:          32 ns/iter (+/- 5)
test bench_num_chars_00800_naive       ... bench:         155 ns/iter (+/- 30)
test bench_num_chars_00900_hyper       ... bench:          37 ns/iter (+/- 4)
test bench_num_chars_00900_naive       ... bench:         173 ns/iter (+/- 28)
test bench_num_chars_01000_hyper       ... bench:          43 ns/iter (+/- 6)
test bench_num_chars_01000_naive       ... bench:         191 ns/iter (+/- 16)
test bench_num_chars_01200_hyper       ... bench:          42 ns/iter (+/- 5)
test bench_num_chars_01200_naive       ... bench:         232 ns/iter (+/- 26)
test bench_num_chars_01400_hyper       ... bench:          53 ns/iter (+/- 9)
test bench_num_chars_01400_naive       ... bench:         268 ns/iter (+/- 31)
test bench_num_chars_01700_hyper       ... bench:          59 ns/iter (+/- 12)
test bench_num_chars_01700_naive       ... bench:         330 ns/iter (+/- 53)
test bench_num_chars_02100_hyper       ... bench:          68 ns/iter (+/- 8)
test bench_num_chars_02100_naive       ... bench:         438 ns/iter (+/- 188)
test bench_num_chars_02500_hyper       ... bench:          78 ns/iter (+/- 3)
test bench_num_chars_02500_naive       ... bench:         479 ns/iter (+/- 52)
test bench_num_chars_03000_hyper       ... bench:          91 ns/iter (+/- 8)
test bench_num_chars_03000_naive       ... bench:         572 ns/iter (+/- 76)
test bench_num_chars_04000_hyper       ... bench:         111 ns/iter (+/- 12)
test bench_num_chars_04000_naive       ... bench:         759 ns/iter (+/- 102)
test bench_num_chars_05000_hyper       ... bench:         158 ns/iter (+/- 35)
test bench_num_chars_05000_naive       ... bench:         964 ns/iter (+/- 140)
test bench_num_chars_06000_hyper       ... bench:         198 ns/iter (+/- 74)
test bench_num_chars_06000_naive       ... bench:       1,337 ns/iter (+/- 346)
test bench_num_chars_07000_hyper       ... bench:         234 ns/iter (+/- 177)
test bench_num_chars_07000_naive       ... bench:       1,590 ns/iter (+/- 421)
test bench_num_chars_08000_hyper       ... bench:         257 ns/iter (+/- 80)
test bench_num_chars_08000_naive       ... bench:       1,661 ns/iter (+/- 415)
test bench_num_chars_09000_hyper       ... bench:         267 ns/iter (+/- 23)
test bench_num_chars_09000_naive       ... bench:       1,767 ns/iter (+/- 314)
test bench_num_chars_10000_hyper       ... bench:         285 ns/iter (+/- 25)
test bench_num_chars_10000_naive       ... bench:       2,037 ns/iter (+/- 855)
test bench_num_chars_12000_hyper       ... bench:         367 ns/iter (+/- 66)
test bench_num_chars_12000_naive       ... bench:       2,434 ns/iter (+/- 341)
test bench_num_chars_14000_hyper       ... bench:         397 ns/iter (+/- 99)
test bench_num_chars_14000_naive       ... bench:       2,690 ns/iter (+/- 204)
test bench_num_chars_17000_hyper       ... bench:         482 ns/iter (+/- 64)
test bench_num_chars_17000_naive       ... bench:       3,212 ns/iter (+/- 423)
test bench_num_chars_21000_hyper       ... bench:         571 ns/iter (+/- 77)
test bench_num_chars_21000_naive       ... bench:       3,962 ns/iter (+/- 398)
test bench_num_chars_25000_hyper       ... bench:         681 ns/iter (+/- 93)
test bench_num_chars_25000_naive       ... bench:       4,708 ns/iter (+/- 434)
test bench_num_chars_30000_hyper       ... bench:         814 ns/iter (+/- 207)
test bench_num_chars_30000_naive       ... bench:       5,696 ns/iter (+/- 833)
test bench_num_chars_big_0100000_hyper ... bench:       3,092 ns/iter (+/- 308)
test bench_num_chars_big_0100000_naive ... bench:      19,066 ns/iter (+/- 1,959)
test bench_num_chars_big_1000000_hyper ... bench:      33,375 ns/iter (+/- 14,003)
test bench_num_chars_big_1000000_naive ... bench:     189,227 ns/iter (+/- 22,577)

AVX

test bench_num_chars_00000_hyper       ... bench:           4 ns/iter (+/- 1)
test bench_num_chars_00000_naive       ... bench:           2 ns/iter (+/- 0)
test bench_num_chars_00010_hyper       ... bench:          11 ns/iter (+/- 2)
test bench_num_chars_00010_naive       ... bench:           8 ns/iter (+/- 5)
test bench_num_chars_00020_hyper       ... bench:          12 ns/iter (+/- 3)
test bench_num_chars_00020_naive       ... bench:           8 ns/iter (+/- 1)
test bench_num_chars_00030_hyper       ... bench:          16 ns/iter (+/- 1)
test bench_num_chars_00030_naive       ... bench:          14 ns/iter (+/- 3)
test bench_num_chars_00040_hyper       ... bench:          16 ns/iter (+/- 2)
test bench_num_chars_00040_naive       ... bench:          14 ns/iter (+/- 2)
test bench_num_chars_00050_hyper       ... bench:          16 ns/iter (+/- 2)
test bench_num_chars_00050_naive       ... bench:          14 ns/iter (+/- 1)
test bench_num_chars_00060_hyper       ... bench:          21 ns/iter (+/- 3)
test bench_num_chars_00060_naive       ... bench:          21 ns/iter (+/- 5)
test bench_num_chars_00070_hyper       ... bench:          17 ns/iter (+/- 3)
test bench_num_chars_00070_naive       ... bench:          22 ns/iter (+/- 11)
test bench_num_chars_00080_hyper       ... bench:          15 ns/iter (+/- 8)
test bench_num_chars_00080_naive       ... bench:          21 ns/iter (+/- 5)
test bench_num_chars_00090_hyper       ... bench:          21 ns/iter (+/- 2)
test bench_num_chars_00090_naive       ... bench:          25 ns/iter (+/- 4)
test bench_num_chars_00100_hyper       ... bench:          14 ns/iter (+/- 0)
test bench_num_chars_00100_naive       ... bench:          22 ns/iter (+/- 3)
test bench_num_chars_00120_hyper       ... bench:          20 ns/iter (+/- 2)
test bench_num_chars_00120_naive       ... bench:          29 ns/iter (+/- 6)
test bench_num_chars_00140_hyper       ... bench:          20 ns/iter (+/- 4)
test bench_num_chars_00140_naive       ... bench:          34 ns/iter (+/- 5)
test bench_num_chars_00170_hyper       ... bench:          18 ns/iter (+/- 1)
test bench_num_chars_00170_naive       ... bench:          39 ns/iter (+/- 6)
test bench_num_chars_00210_hyper       ... bench:          18 ns/iter (+/- 2)
test bench_num_chars_00210_naive       ... bench:          43 ns/iter (+/- 9)
test bench_num_chars_00250_hyper       ... bench:          23 ns/iter (+/- 3)
test bench_num_chars_00250_naive       ... bench:          57 ns/iter (+/- 9)
test bench_num_chars_00300_hyper       ... bench:          21 ns/iter (+/- 3)
test bench_num_chars_00300_naive       ... bench:          64 ns/iter (+/- 11)
test bench_num_chars_00400_hyper       ... bench:          18 ns/iter (+/- 3)
test bench_num_chars_00400_naive       ... bench:          79 ns/iter (+/- 8)
test bench_num_chars_00500_hyper       ... bench:          23 ns/iter (+/- 4)
test bench_num_chars_00500_naive       ... bench:          99 ns/iter (+/- 17)
test bench_num_chars_00600_hyper       ... bench:          26 ns/iter (+/- 5)
test bench_num_chars_00600_naive       ... bench:         122 ns/iter (+/- 4)
test bench_num_chars_00700_hyper       ... bench:          29 ns/iter (+/- 3)
test bench_num_chars_00700_naive       ... bench:         140 ns/iter (+/- 22)
test bench_num_chars_00800_hyper       ... bench:          20 ns/iter (+/- 5)
test bench_num_chars_00800_naive       ... bench:         151 ns/iter (+/- 27)
test bench_num_chars_00900_hyper       ... bench:          23 ns/iter (+/- 1)
test bench_num_chars_00900_naive       ... bench:         174 ns/iter (+/- 19)
test bench_num_chars_01000_hyper       ... bench:          27 ns/iter (+/- 5)
test bench_num_chars_01000_naive       ... bench:         192 ns/iter (+/- 20)
test bench_num_chars_01200_hyper       ... bench:          28 ns/iter (+/- 5)
test bench_num_chars_01200_naive       ... bench:         230 ns/iter (+/- 24)
test bench_num_chars_01400_hyper       ... bench:          35 ns/iter (+/- 5)
test bench_num_chars_01400_naive       ... bench:         269 ns/iter (+/- 26)
test bench_num_chars_01700_hyper       ... bench:          32 ns/iter (+/- 3)
test bench_num_chars_01700_naive       ... bench:         328 ns/iter (+/- 77)
test bench_num_chars_02100_hyper       ... bench:          41 ns/iter (+/- 8)
test bench_num_chars_02100_naive       ... bench:         399 ns/iter (+/- 44)
test bench_num_chars_02500_hyper       ... bench:          43 ns/iter (+/- 5)
test bench_num_chars_02500_naive       ... bench:         479 ns/iter (+/- 50)
test bench_num_chars_03000_hyper       ... bench:          56 ns/iter (+/- 2)
test bench_num_chars_03000_naive       ... bench:         571 ns/iter (+/- 75)
test bench_num_chars_04000_hyper       ... bench:          58 ns/iter (+/- 7)
test bench_num_chars_04000_naive       ... bench:         760 ns/iter (+/- 86)
test bench_num_chars_05000_hyper       ... bench:          79 ns/iter (+/- 15)
test bench_num_chars_05000_naive       ... bench:         957 ns/iter (+/- 109)
test bench_num_chars_06000_hyper       ... bench:         110 ns/iter (+/- 17)
test bench_num_chars_06000_naive       ... bench:       1,133 ns/iter (+/- 112)
test bench_num_chars_07000_hyper       ... bench:         122 ns/iter (+/- 14)
test bench_num_chars_07000_naive       ... bench:       1,330 ns/iter (+/- 160)
test bench_num_chars_08000_hyper       ... bench:         127 ns/iter (+/- 18)
test bench_num_chars_08000_naive       ... bench:       1,510 ns/iter (+/- 201)
test bench_num_chars_09000_hyper       ... bench:         136 ns/iter (+/- 14)
test bench_num_chars_09000_naive       ... bench:       1,701 ns/iter (+/- 285)
test bench_num_chars_10000_hyper       ... bench:         141 ns/iter (+/- 16)
test bench_num_chars_10000_naive       ... bench:       1,908 ns/iter (+/- 279)
test bench_num_chars_12000_hyper       ... bench:         180 ns/iter (+/- 30)
test bench_num_chars_12000_naive       ... bench:       2,241 ns/iter (+/- 335)
test bench_num_chars_14000_hyper       ... bench:         197 ns/iter (+/- 29)
test bench_num_chars_14000_naive       ... bench:       2,788 ns/iter (+/- 500)
test bench_num_chars_17000_hyper       ... bench:         255 ns/iter (+/- 85)
test bench_num_chars_17000_naive       ... bench:       3,277 ns/iter (+/- 482)
test bench_num_chars_21000_hyper       ... bench:         322 ns/iter (+/- 140)
test bench_num_chars_21000_naive       ... bench:       3,995 ns/iter (+/- 663)
test bench_num_chars_25000_hyper       ... bench:         355 ns/iter (+/- 60)
test bench_num_chars_25000_naive       ... bench:       4,800 ns/iter (+/- 795)
test bench_num_chars_30000_hyper       ... bench:         453 ns/iter (+/- 124)
test bench_num_chars_30000_naive       ... bench:       5,901 ns/iter (+/- 1,495)
test bench_num_chars_big_0100000_hyper ... bench:       1,589 ns/iter (+/- 167)
test bench_num_chars_big_0100000_naive ... bench:      19,029 ns/iter (+/- 2,418)
test bench_num_chars_big_1000000_hyper ... bench:      20,987 ns/iter (+/- 1,211)
test bench_num_chars_big_1000000_naive ... bench:     189,939 ns/iter (+/- 16,115)
llogiq commented 6 years ago

Great work as always! Even in this implementation, we win against naive count from ~80 chars upwards. Not too shabby for a proof of concept. :+1:

Veedrac commented 6 years ago

I made a bunch of graphs (the tabs are near the bottom of the page).

https://docs.google.com/spreadsheets/d/1MMPiAPwFZvW8_jfiz5qqM8dpDNGppRrGNxtNUNgUa3E/edit?usp=sharing

Given these results I'm not sure there's much that can be done to improve timings; there is some stuff I could try but it doesn't look like a major benefit timing-wise.

So when/if you're happy with the code correctness-wise, might as well submit it as-is.

llogiq commented 6 years ago

OK then, I can still benchmark things if I find the time. No need to delay the merge. I'll make a new release soon.

llogiq commented 6 years ago

I have a non-SIMD version that shaves a few cycles. Need yo clean it up and bring the other versions in line.