sharksforarms / deku

Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization
Apache License 2.0
1.14k stars 55 forks source link

Improve deku performance #345

Closed wcampbell0x2a closed 1 year ago

wcampbell0x2a commented 1 year ago

Still want to do benchmarks and more testing, but it's done-ish

See #344

github-actions[bot] commented 1 year ago

Benchmark for d1b38cf

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | deku_read_bits | 608.6±0.50ns | 608.7±0.69ns | +0.02% | | deku_read_byte | 17.6±0.09ns | **8.4±0.04ns** | **-52.27%** | | deku_read_enum | 30.1±0.54ns | **16.7±0.12ns** | **-44.52%** | | deku_read_vec | 1402.0±11.22ns | **934.5±8.60ns** | **-33.35%** | | deku_read_vec_perf | 1404.7±10.10ns | **947.9±5.13ns** | **-32.52%** | | deku_write_bits | **108.0±0.20ns** | 111.4±0.61ns | **+3.15%** | | deku_write_byte | 66.7±0.25ns | 66.7±0.25ns | 0.00% | | deku_write_enum | **111.8±0.37ns** | 115.0±0.63ns | **+2.86%** | | deku_write_vec | 5.2±0.02µs | 5.2±0.01µs | 0.00% | | deku_write_vec_perf | 5.3±0.01µs | **5.2±0.01µs** | **-1.89%** |
github-actions[bot] commented 1 year ago

Benchmark for a443c4b

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | deku_read_bits | 608.9±0.42ns | 608.7±0.84ns | -0.03% | | deku_read_byte | 17.6±0.09ns | **8.4±0.00ns** | **-52.27%** | | deku_read_enum | 29.7±0.15ns | **16.8±0.13ns** | **-43.43%** | | deku_read_vec | 1399.1±4.71ns | **934.4±8.82ns** | **-33.21%** | | deku_read_vec_perf | 1398.2±5.90ns | **949.8±5.92ns** | **-32.07%** | | deku_write_bits | **108.2±0.24ns** | 110.9±0.41ns | **+2.50%** | | deku_write_byte | 66.7±0.34ns | **66.4±0.26ns** | **-0.45%** | | deku_write_enum | **111.7±0.38ns** | 114.9±0.44ns | **+2.86%** | | deku_write_vec | 5.2±0.01µs | 5.2±0.01µs | 0.00% | | deku_write_vec_perf | 5.3±0.01µs | **5.2±0.01µs** | **-1.89%** |
github-actions[bot] commented 1 year ago

Benchmark for 1b73cc4

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | deku_read_bits | 608.8±0.45ns | **604.6±0.44ns** | **-0.69%** | | deku_read_byte | 17.7±0.13ns | **9.7±0.16ns** | **-45.20%** | | deku_read_enum | 29.8±0.09ns | **17.9±0.16ns** | **-39.93%** | | deku_read_vec | 1404.0±6.45ns | **940.2±4.42ns** | **-33.03%** | | deku_read_vec_perf | 1403.3±8.17ns | **909.6±4.37ns** | **-35.18%** | | deku_write_bits | 112.2±0.55ns | **110.3±0.56ns** | **-1.69%** | | deku_write_byte | 68.5±6.98ns | **64.7±0.16ns** | **-5.55%** | | deku_write_enum | 111.7±0.58ns | **110.2±0.17ns** | **-1.34%** | | deku_write_vec | 5.2±0.01µs | **5.1±0.01µs** | **-1.92%** | | deku_write_vec_perf | 5.3±0.02µs | **5.1±0.01µs** | **-3.77%** |
v1gnesh commented 1 year ago

Hey @wcampbell0x2a I have a dumb question - Since in many cases, parsing is for static content (either sizes known fully at compile time or reasonable upper bounds may be known for dynamic content), would it help if memory for these were pre-allocated before getting into read?

wcampbell0x2a commented 1 year ago

Hey @wcampbell0x2a I have a dumb question - Since in many cases, parsing is for static content (either sizes known fully at compile time or reasonable upper bounds may be known for dynamic content), would it help if memory for these were pre-allocated before getting into read?

You can see some tests for allocation counts here: https://github.com/sharksforarms/deku/blob/master/tests/test_alloc.rs. Many of these allocations are quite dynamic based upon bit alignment when reading.

wcampbell0x2a commented 1 year ago

Benchmark for 1b73cc4

Click to view benchmark

This commit showed a %5 performance increase for deku deku_read_byte on my machine.

sharksforarms commented 1 year ago

Yeah not sure how much I trust the CI benchmarking, I'm not sure it's fully accurate. I'll take a look at this when I have more time, thanks for working on this!

wcampbell0x2a commented 1 year ago

This breaks the logging feature. I'll need to fix that.

wcampbell0x2a commented 1 year ago

I also need to test the old + new with both having LTO enabled. The performance might be worse with this patch.

not true