reader: Improve performance of read_all

wcampbell0x2a commented 6 months ago

Add Leftover::{Byte, Bits}, so that instead of conversion straight away into a Bitvec, we keep it as a slice instead. In the case of the read_all attribute this improves the speed, as reading until EOF doesn't keep causing our reader to convert to bits and back again all the time.
This does result in some slowdown, but with some #[inline(never)] we can keep it to a minimum. The total gain to read_all speed is worth any .8 ms slowdown I saw in testing.

Closes #439

github-actions[bot] commented 6 months ago

Benchmark for b49a65c

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.02µs | N/A | N/A | | deku_read_bits | 1230.7±22.08ns | 1250.3±20.97ns | +1.59% | | deku_read_byte | 3.5±0.05ns | **3.4±0.13ns** | **-2.86%** | | deku_read_enum | 3.0±0.05ns | **2.6±0.04ns** | **-13.33%** | | deku_read_vec | **34.1±0.44ns** | 37.6±0.89ns | **+10.26%** | | deku_write_bits | 161.9±3.24ns | 162.7±17.05ns | +0.49% | | deku_write_byte | **21.8±0.20ns** | 22.7±0.34ns | **+4.13%** | | deku_write_enum | **21.1±0.41ns** | 21.7±0.38ns | **+2.84%** | | deku_write_vec | 422.7±5.49ns | **362.6±7.05ns** | **-14.22%** | | read_all | 4.1±0.06µs | N/A | N/A | | read_all_bytes | 3.6±0.03µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for 7f81f7b

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.6±0.05µs | N/A | N/A | | deku_read_bits | **1187.3±15.54ns** | 1238.2±13.65ns | **+4.29%** | | deku_read_byte | **3.3±0.03ns** | 3.4±0.03ns | **+3.03%** | | deku_read_enum | 2.6±0.05ns | 2.6±0.05ns | 0.00% | | deku_read_vec | **35.8±0.55ns** | 37.5±0.68ns | **+4.75%** | | deku_write_bits | **150.3±1.46ns** | 153.7±3.39ns | **+2.26%** | | deku_write_byte | 22.7±0.20ns | 22.7±0.69ns | 0.00% | | deku_write_enum | 21.5±0.31ns | 21.7±0.45ns | +0.93% | | deku_write_vec | **351.2±5.78ns** | 406.7±4.96ns | **+15.80%** | | read_all | 5.2±0.04µs | N/A | N/A | | read_all_bytes | 4.8±0.04µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for 42291cc

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.03µs | N/A | N/A | | deku_read_bits | **1232.7±21.71ns** | 1283.2±13.27ns | **+4.10%** | | deku_read_byte | **3.3±0.02ns** | 3.4±0.25ns | **+3.03%** | | deku_read_enum | 2.6±0.04ns | 2.6±0.16ns | 0.00% | | deku_read_vec | **34.0±0.50ns** | 38.0±0.73ns | **+11.76%** | | deku_write_bits | **148.0±2.23ns** | 155.2±12.15ns | **+4.86%** | | deku_write_byte | **21.8±0.35ns** | 22.7±0.33ns | **+4.13%** | | deku_write_enum | **21.1±0.35ns** | 21.8±0.46ns | **+3.32%** | | deku_write_vec | **353.8±28.27ns** | 370.1±11.25ns | **+4.61%** | | read_all | 4.1±0.10µs | N/A | N/A | | read_all_bytes | 3.6±0.04µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for a473082

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.03µs | N/A | N/A | | deku_read_bits | 1253.0±20.87ns | **1198.3±18.86ns** | **-4.37%** | | deku_read_byte | **3.3±0.07ns** | 3.6±0.07ns | **+9.09%** | | deku_read_enum | **2.6±0.05ns** | 3.0±0.07ns | **+15.38%** | | deku_read_vec | **33.9±0.53ns** | 37.7±0.69ns | **+11.21%** | | deku_write_bits | 152.9±3.45ns | 153.9±3.13ns | +0.65% | | deku_write_byte | **21.8±0.51ns** | 22.4±0.40ns | **+2.75%** | | deku_write_enum | **21.1±0.33ns** | 21.7±0.37ns | **+2.84%** | | deku_write_vec | **357.3±38.06ns** | 421.8±4.87ns | **+18.05%** | | read_all | 5.0±0.04µs | N/A | N/A | | read_all_bytes | 4.6±0.18µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for e1f913a

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.04µs | N/A | N/A | | deku_read_bits | 1413.7±27.29ns | **1298.0±20.27ns** | **-8.18%** | | deku_read_byte | **3.3±0.08ns** | 3.4±0.08ns | **+3.03%** | | deku_read_enum | 2.6±0.04ns | 2.6±0.11ns | 0.00% | | deku_read_vec | **34.0±0.56ns** | 43.0±1.08ns | **+26.47%** | | deku_write_bits | 170.9±8.62ns | **165.2±3.57ns** | **-3.34%** | | deku_write_byte | **21.9±0.39ns** | 23.6±0.62ns | **+7.76%** | | deku_write_enum | **21.4±0.53ns** | 22.2±0.60ns | **+3.74%** | | deku_write_vec | 351.9±9.65ns | 355.1±4.87ns | +0.91% | | read_all | 4.1±0.07µs | N/A | N/A | | read_all_bytes | 5.0±0.03µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for 8c98ec9

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.8±0.03µs | N/A | N/A | | deku_read_bits | **1183.6±16.25ns** | 1325.2±11.72ns | **+11.96%** | | deku_read_byte | **3.3±0.05ns** | 3.4±0.05ns | **+3.03%** | | deku_read_enum | 2.6±0.06ns | 2.6±0.08ns | 0.00% | | deku_read_vec | **33.9±0.53ns** | 39.9±0.74ns | **+17.70%** | | deku_write_bits | **150.7±2.83ns** | 167.3±12.46ns | **+11.02%** | | deku_write_byte | **21.7±0.36ns** | 22.4±0.39ns | **+3.23%** | | deku_write_enum | **21.1±0.33ns** | 21.7±0.33ns | **+2.84%** | | deku_write_vec | 350.3±4.98ns | 350.9±3.68ns | +0.17% | | read_all | 4.1±0.05µs | N/A | N/A | | read_all_bytes | 5.0±0.04µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for d2397b4

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 4.6±0.04µs | N/A | N/A | | deku_read_bits | 1234.3±15.55ns | **1211.2±12.58ns** | **-1.87%** | | deku_read_byte | 3.4±0.07ns | 3.4±0.11ns | 0.00% | | deku_read_enum | 2.6±0.41ns | 2.6±0.06ns | 0.00% | | deku_read_vec | **33.5±0.56ns** | 38.9±0.46ns | **+16.12%** | | deku_write_bits | 153.5±21.23ns | 153.5±8.61ns | 0.00% | | deku_write_byte | 21.8±0.35ns | 21.9±0.37ns | +0.46% | | deku_write_enum | **21.1±0.27ns** | 21.7±0.22ns | **+2.84%** | | deku_write_vec | 404.6±5.31ns | 406.5±5.66ns | +0.47% | | read_all | 17.6±0.01µs | N/A | N/A | | read_all_bytes | 17.6±0.03µs | N/A | N/A |

github-actions[bot] commented 6 months ago

Benchmark for 9ff94ec

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.3±0.05µs | N/A | N/A | | deku_read_bits | **1197.1±17.14ns** | 1299.6±18.08ns | **+8.56%** | | deku_read_byte | 3.4±0.04ns | 3.4±0.11ns | 0.00% | | deku_read_enum | **2.5±0.04ns** | 2.6±0.04ns | **+4.00%** | | deku_read_vec | **34.2±0.47ns** | 37.9±0.63ns | **+10.82%** | | deku_write_bits | **148.0±3.18ns** | 166.4±5.02ns | **+12.43%** | | deku_write_byte | 21.9±0.21ns | 21.8±0.38ns | -0.46% | | deku_write_enum | **21.1±0.44ns** | 21.8±0.54ns | **+3.32%** | | deku_write_vec | 347.1±6.39ns | 351.0±6.85ns | +1.12% | | read_all | 17.5±0.09µs | N/A | N/A | | read_all_bytes | 16.5±0.10µs | N/A | N/A |

wcampbell0x2a commented 6 months ago

For reference, not as perf as it was ;)

+ critcmp stable nightly stable-perf nightly-perf
group                 nightly                                nightly-perf                            stable                                 stable-perf
-----                 -------                                ------------                            ------                                 -----------
Deserialize/binrw     1.38      2.8±0.06ms        ? ?/sec    1.00      2.0±0.04ms        ? ?/sec     1.37      2.8±0.10ms        ? ?/sec    1.01      2.1±0.07ms        ? ?/sec
Deserialize/custom    1.21      2.2±0.06ms        ? ?/sec    1.00  1826.9±30.32µs        ? ?/sec     1.21      2.2±0.07ms        ? ?/sec    1.03  1873.1±56.09µs        ? ?/sec
Deserialize/deku      1.11      3.1±0.08ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec     1.05      3.0±0.06ms        ? ?/sec    1.01      2.8±0.13ms        ? ?/sec
Serialize/binrw       2.09      5.5±0.21ms        ? ?/sec    1.00      2.7±0.06ms        ? ?/sec     2.19      5.8±0.21ms        ? ?/sec    1.00      2.7±0.10ms        ? ?/sec
Serialize/custom      1.11  1943.4±49.78µs        ? ?/sec    1.02  1790.1±137.87µs        ? ?/sec    1.14      2.0±0.04ms        ? ?/sec    1.00  1752.8±44.16µs        ? ?/sec
Serialize/deku        1.12      3.1±0.17ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec     1.10      3.1±0.08ms        ? ?/sec    1.01      2.8±0.14ms        ? ?/sec

before:

github-actions[bot] commented 6 months ago

Benchmark for bd2a83c

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.4±0.05µs | N/A | N/A | | deku_read_bits | 1241.8±15.85ns | **1185.6±38.67ns** | **-4.53%** | | deku_read_byte | 15.4±0.02ns | **3.4±0.10ns** | **-77.92%** | | deku_read_enum | 15.5±0.06ns | **2.6±0.05ns** | **-83.23%** | | deku_read_vec | 41.3±0.42ns | **36.9±0.61ns** | **-10.65%** | | deku_write_bits | **161.8±2.92ns** | 169.1±3.97ns | **+4.51%** | | deku_write_byte | 22.1±0.32ns | 22.1±0.34ns | 0.00% | | deku_write_enum | **21.4±0.30ns** | 22.0±0.30ns | **+2.80%** | | deku_write_vec | 420.7±3.94ns | **342.9±4.97ns** | **-18.49%** | | read_all | 16.6±0.11µs | N/A | N/A | | read_all_bytes | 17.1±0.24µs | N/A | N/A |

wcampbell0x2a commented 6 months ago

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..

read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

wcampbell0x2a commented 6 months ago

Hmmm, my inline(never) hurt the performance of the count vs read_all. I might have to balance this..
read_all_bytes          time:   [10.605 µs 10.623 µs 10.649 µs]
read_all                time:   [10.715 µs 10.727 µs 10.741 µs]
count                   time:   [2.5363 µs 2.5426 µs 2.5526 µs]

Or! Just take the improvements and go to sleep ;)

github-actions[bot] commented 6 months ago

Benchmark for 5f329ba

Click to view benchmark

| Test | Base | PR | % | |------|--------------|------------------|---| | count | 3.7±0.05µs | N/A | N/A | | deku_read_bits | 1304.6±18.95ns | **1274.5±11.97ns** | **-2.31%** | | deku_read_byte | **3.3±0.10ns** | 3.4±0.07ns | **+3.03%** | | deku_read_enum | **2.6±0.05ns** | 3.0±0.05ns | **+15.38%** | | deku_read_vec | 35.2±0.58ns | 35.6±0.53ns | +1.14% | | deku_write_bits | **149.8±2.65ns** | 170.6±3.72ns | **+13.89%** | | deku_write_byte | **22.1±0.30ns** | 22.4±0.22ns | **+1.36%** | | deku_write_enum | **21.4±0.43ns** | 21.7±0.20ns | **+1.40%** | | deku_write_vec | 398.3±3.11ns | **333.4±7.79ns** | **-16.29%** | | read_all | 4.4±0.07µs | N/A | N/A | | read_all_bytes | 3.9±0.03µs | N/A | N/A |

sharksforarms / deku