sharksforarms / deku

Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization
Apache License 2.0
1.05k stars 54 forks source link

reader: Improve performance of read_all #441

Open wcampbell0x2a opened 1 month ago

wcampbell0x2a commented 1 month ago

Closes #439

github-actions[bot] commented 1 month ago

Benchmark for b49a65c

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.02µs | N/A | N/A | | deku_read_bits | 1230.7±22.08ns | 1250.3±20.97ns | +1.59% | | deku_read_byte | 3.5±0.05ns | **3.4±0.13ns** | **-2.86%** | | deku_read_enum | 3.0±0.05ns | **2.6±0.04ns** | **-13.33%** | | deku_read_vec | **34.1±0.44ns** | 37.6±0.89ns | **+10.26%** | | deku_write_bits | 161.9±3.24ns | 162.7±17.05ns | +0.49% | | deku_write_byte | **21.8±0.20ns** | 22.7±0.34ns | **+4.13%** | | deku_write_enum | **21.1±0.41ns** | 21.7±0.38ns | **+2.84%** | | deku_write_vec | 422.7±5.49ns | **362.6±7.05ns** | **-14.22%** | | read_all | 4.1±0.06µs | N/A | N/A | | read_all_bytes | 3.6±0.03µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for 7f81f7b

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.6±0.05µs | N/A | N/A | | deku_read_bits | **1187.3±15.54ns** | 1238.2±13.65ns | **+4.29%** | | deku_read_byte | **3.3±0.03ns** | 3.4±0.03ns | **+3.03%** | | deku_read_enum | 2.6±0.05ns | 2.6±0.05ns | 0.00% | | deku_read_vec | **35.8±0.55ns** | 37.5±0.68ns | **+4.75%** | | deku_write_bits | **150.3±1.46ns** | 153.7±3.39ns | **+2.26%** | | deku_write_byte | 22.7±0.20ns | 22.7±0.69ns | 0.00% | | deku_write_enum | 21.5±0.31ns | 21.7±0.45ns | +0.93% | | deku_write_vec | **351.2±5.78ns** | 406.7±4.96ns | **+15.80%** | | read_all | 5.2±0.04µs | N/A | N/A | | read_all_bytes | 4.8±0.04µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for 42291cc

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.03µs | N/A | N/A | | deku_read_bits | **1232.7±21.71ns** | 1283.2±13.27ns | **+4.10%** | | deku_read_byte | **3.3±0.02ns** | 3.4±0.25ns | **+3.03%** | | deku_read_enum | 2.6±0.04ns | 2.6±0.16ns | 0.00% | | deku_read_vec | **34.0±0.50ns** | 38.0±0.73ns | **+11.76%** | | deku_write_bits | **148.0±2.23ns** | 155.2±12.15ns | **+4.86%** | | deku_write_byte | **21.8±0.35ns** | 22.7±0.33ns | **+4.13%** | | deku_write_enum | **21.1±0.35ns** | 21.8±0.46ns | **+3.32%** | | deku_write_vec | **353.8±28.27ns** | 370.1±11.25ns | **+4.61%** | | read_all | 4.1±0.10µs | N/A | N/A | | read_all_bytes | 3.6±0.04µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for a473082

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.03µs | N/A | N/A | | deku_read_bits | 1253.0±20.87ns | **1198.3±18.86ns** | **-4.37%** | | deku_read_byte | **3.3±0.07ns** | 3.6±0.07ns | **+9.09%** | | deku_read_enum | **2.6±0.05ns** | 3.0±0.07ns | **+15.38%** | | deku_read_vec | **33.9±0.53ns** | 37.7±0.69ns | **+11.21%** | | deku_write_bits | 152.9±3.45ns | 153.9±3.13ns | +0.65% | | deku_write_byte | **21.8±0.51ns** | 22.4±0.40ns | **+2.75%** | | deku_write_enum | **21.1±0.33ns** | 21.7±0.37ns | **+2.84%** | | deku_write_vec | **357.3±38.06ns** | 421.8±4.87ns | **+18.05%** | | read_all | 5.0±0.04µs | N/A | N/A | | read_all_bytes | 4.6±0.18µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for e1f913a

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.04µs | N/A | N/A | | deku_read_bits | 1413.7±27.29ns | **1298.0±20.27ns** | **-8.18%** | | deku_read_byte | **3.3±0.08ns** | 3.4±0.08ns | **+3.03%** | | deku_read_enum | 2.6±0.04ns | 2.6±0.11ns | 0.00% | | deku_read_vec | **34.0±0.56ns** | 43.0±1.08ns | **+26.47%** | | deku_write_bits | 170.9±8.62ns | **165.2±3.57ns** | **-3.34%** | | deku_write_byte | **21.9±0.39ns** | 23.6±0.62ns | **+7.76%** | | deku_write_enum | **21.4±0.53ns** | 22.2±0.60ns | **+3.74%** | | deku_write_vec | 351.9±9.65ns | 355.1±4.87ns | +0.91% | | read_all | 4.1±0.07µs | N/A | N/A | | read_all_bytes | 5.0±0.03µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for 8c98ec9

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.03µs | N/A | N/A | | deku_read_bits | **1183.6±16.25ns** | 1325.2±11.72ns | **+11.96%** | | deku_read_byte | **3.3±0.05ns** | 3.4±0.05ns | **+3.03%** | | deku_read_enum | 2.6±0.06ns | 2.6±0.08ns | 0.00% | | deku_read_vec | **33.9±0.53ns** | 39.9±0.74ns | **+17.70%** | | deku_write_bits | **150.7±2.83ns** | 167.3±12.46ns | **+11.02%** | | deku_write_byte | **21.7±0.36ns** | 22.4±0.39ns | **+3.23%** | | deku_write_enum | **21.1±0.33ns** | 21.7±0.33ns | **+2.84%** | | deku_write_vec | 350.3±4.98ns | 350.9±3.68ns | +0.17% | | read_all | 4.1±0.05µs | N/A | N/A | | read_all_bytes | 5.0±0.04µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for d2397b4

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 4.6±0.04µs | N/A | N/A | | deku_read_bits | 1234.3±15.55ns | **1211.2±12.58ns** | **-1.87%** | | deku_read_byte | 3.4±0.07ns | 3.4±0.11ns | 0.00% | | deku_read_enum | 2.6±0.41ns | 2.6±0.06ns | 0.00% | | deku_read_vec | **33.5±0.56ns** | 38.9±0.46ns | **+16.12%** | | deku_write_bits | 153.5±21.23ns | 153.5±8.61ns | 0.00% | | deku_write_byte | 21.8±0.35ns | 21.9±0.37ns | +0.46% | | deku_write_enum | **21.1±0.27ns** | 21.7±0.22ns | **+2.84%** | | deku_write_vec | 404.6±5.31ns | 406.5±5.66ns | +0.47% | | read_all | 17.6±0.01µs | N/A | N/A | | read_all_bytes | 17.6±0.03µs | N/A | N/A |
github-actions[bot] commented 1 month ago

Benchmark for 9ff94ec

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.3±0.05µs | N/A | N/A | | deku_read_bits | **1197.1±17.14ns** | 1299.6±18.08ns | **+8.56%** | | deku_read_byte | 3.4±0.04ns | 3.4±0.11ns | 0.00% | | deku_read_enum | **2.5±0.04ns** | 2.6±0.04ns | **+4.00%** | | deku_read_vec | **34.2±0.47ns** | 37.9±0.63ns | **+10.82%** | | deku_write_bits | **148.0±3.18ns** | 166.4±5.02ns | **+12.43%** | | deku_write_byte | 21.9±0.21ns | 21.8±0.38ns | -0.46% | | deku_write_enum | **21.1±0.44ns** | 21.8±0.54ns | **+3.32%** | | deku_write_vec | 347.1±6.39ns | 351.0±6.85ns | +1.12% | | read_all | 17.5±0.09µs | N/A | N/A | | read_all_bytes | 16.5±0.10µs | N/A | N/A |
wcampbell0x2a commented 1 month ago

For reference, not as perf as it was ;)

+ critcmp stable nightly stable-perf nightly-perf
group                 nightly                                nightly-perf                            stable                                 stable-perf
-----                 -------                                ------------                            ------                                 -----------
Deserialize/binrw     1.38      2.8±0.06ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec     1.37      2.8±0.10ms        ? ?/sec    1.01      2.1±0.07ms        ? ?/sec
Deserialize/custom    1.21      2.2±0.06ms        ? ?/sec    1.00  1826.9±30.32µs        ? ?/sec     1.21      2.2±0.07ms        ? ?/sec    1.03  1873.1±56.09µs        ? ?/sec
Deserialize/deku      1.11      3.1±0.08ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec     1.05      3.0±0.06ms        ? ?/sec    1.01      2.8±0.13ms        ? ?/sec
Serialize/binrw       2.09      5.5±0.21ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec     2.19      5.8±0.21ms        ? ?/sec    1.00      2.7±0.10ms        ? ?/sec
Serialize/custom      1.11  1943.4±49.78µs        ? ?/sec    1.02  1790.1±137.87µs        ? ?/sec    1.14      2.0±0.04ms        ? ?/sec    1.00  1752.8±44.16µs        ? ?/sec
Serialize/deku        1.12      3.1±0.17ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec     1.10      3.1±0.08ms        ? ?/sec    1.01      2.8±0.14ms        ? ?/sec

before: image

github-actions[bot] commented 1 month ago

Benchmark for bd2a83c

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.4±0.05µs | N/A | N/A | | deku_read_bits | 1241.8±15.85ns | **1185.6±38.67ns** | **-4.53%** | | deku_read_byte | 15.4±0.02ns | **3.4±0.10ns** | **-77.92%** | | deku_read_enum | 15.5±0.06ns | **2.6±0.05ns** | **-83.23%** | | deku_read_vec | 41.3±0.42ns | **36.9±0.61ns** | **-10.65%** | | deku_write_bits | **161.8±2.92ns** | 169.1±3.97ns | **+4.51%** | | deku_write_byte | 22.1±0.32ns | 22.1±0.34ns | 0.00% | | deku_write_enum | **21.4±0.30ns** | 22.0±0.30ns | **+2.80%** | | deku_write_vec | 420.7±3.94ns | **342.9±4.97ns** | **-18.49%** | | read_all | 16.6±0.11µs | N/A | N/A | | read_all_bytes | 17.1±0.24µs | N/A | N/A |
wcampbell0x2a commented 1 month ago

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]
wcampbell0x2a commented 1 month ago

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

Or! Just take the improvements and go to sleep ;)

github-actions[bot] commented 1 month ago

Benchmark for 5f329ba

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.7±0.05µs | N/A | N/A | | deku_read_bits | 1304.6±18.95ns | **1274.5±11.97ns** | **-2.31%** | | deku_read_byte | **3.3±0.10ns** | 3.4±0.07ns | **+3.03%** | | deku_read_enum | **2.6±0.05ns** | 3.0±0.05ns | **+15.38%** | | deku_read_vec | 35.2±0.58ns | 35.6±0.53ns | +1.14% | | deku_write_bits | **149.8±2.65ns** | 170.6±3.72ns | **+13.89%** | | deku_write_byte | **22.1±0.30ns** | 22.4±0.22ns | **+1.36%** | | deku_write_enum | **21.4±0.43ns** | 21.7±0.20ns | **+1.40%** | | deku_write_vec | 398.3±3.11ns | **333.4±7.79ns** | **-16.29%** | | read_all | 4.4±0.07µs | N/A | N/A | | read_all_bytes | 3.9±0.03µs | N/A | N/A |