Specialize bytes - Githubissues

wcampbell0x2a commented 2 years ago

Remove Size in favor of a BitSize and ByteSize. This allows the Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize) impls to be created and allow the ByteSize impls to be faster. Most of the assumption I made (and the perf that is gained) is from the removal of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See https://github.com/sharksforarms/deku/issues/44

I Added some benchmarks, in order to get a good idea of what perf this allows. The benchmarks look pretty good, but I'm on my old thinkpad. These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

See https://github.com/sharksforarms/deku/issues/25

wcampbell0x2a commented 2 years ago

I'm getting errors on the following lines:

failures:
    src/attributes.rs - attributes (line 1130)
    src/attributes.rs - attributes (line 270)

For these tests, not sure why: Example:

  # use deku::prelude::*;
  # use std::convert::{TryInto, TryFrom};
  # #[derive(PartialEq, Debug, DekuRead, DekuWrite)]
  #[deku(type = "u32", bytes = "2")]
  enum DekuTest {
      #[deku(id = "0xBEEF")]
      VariantA(u8),
  }

  let data: Vec<u8> = vec![0xEF, 0xBE, 0xFF];

  let value = DekuTest::try_from(data.as_ref()).unwrap();

  assert_eq!(
      DekuTest::VariantA(0xFF),
      value
  );

  let value: Vec<u8> = value.try_into().unwrap();
  assert_eq!(data, value);

Example:

  # use deku::prelude::*;
  # use std::convert::{TryInto, TryFrom};
  # #[derive(Debug, PartialEq, DekuRead, DekuWrite)]
  struct DekuTest {
      #[deku(bytes = 2)]
      field_a: u32,
      field_b: u8, // defaults to size_of<u8>
  }

  let data: Vec<u8> = vec![0xAB, 0xCD, 0xFF];

  let value = DekuTest::try_from(data.as_ref()).unwrap();

  assert_eq!(
      DekuTest {
         field_a: 0xCDAB,
         field_b: 0xFF,
      },
      value
  );

  let value: Vec<u8> = value.try_into().unwrap();
  assert_eq!(data, value);

wcampbell0x2a commented 2 years ago

@sharksforarms @constfold lmk what you think

constfold commented 2 years ago

failures:
   src/attributes.rs - attributes (line 1130)
   src/attributes.rs - attributes (line 270)
two error messages both said:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parse("error parsing from slice: could not convert slice to array")', src/attributes.rs:15:47
It looks like the same as #224, caused by alignment issue(might be the one I mentioned above).

sharksforarms commented 2 years ago

Hey @wcampbell0x2a, nice work! and :wave: @constfold nice to hear from you too. This is looking great (thanks for the benchmarks!)

I don't have any comments apart from the test cases, I can offer my understanding on why they're failing.

#[deku(type = "u32", bytes = "2")]
enum DekuTest {
    #[deku(id = "0xBEEF")]
    VariantA(u8),
}

let data: Vec<u8> = vec![0xEF, 0xBE, 0xFF];
let value = DekuTest::try_from(data.as_ref()).unwrap();

So, the issue with these tests is that the changes no longer consider padding when reading:

Original code:

                let value = if pad == 0
                    && bit_slice.len() == max_type_bits
                    && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                {
                    // if everything is aligned, just read the value

This section of code is an optimisation for when we have exactly what we want.

The above test using this PR, will read 2 bytes and try to convert that slice to a u32 and fail (u32 needs 4 bytes).

To fix this, we'd need to bring padding back some way or another. I'm thinking it may be possible to pad using arithmetic instead of allocation too. Maybe a later optimisation.

wcampbell0x2a commented 2 years ago

So, the issue with these tests is that the changes no longer consider padding when reading:

Could we add a Padding struct for this? When bytes is more than type? Are there other cases in which this would happen for a field in a struct as well as this enum usage? This also wouldn't happen for type = u8 as I see it.

Just trying to reduce as many branches as I can for performance.

sharksforarms commented 2 years ago

So, the issue with these tests is that the changes no longer consider padding when reading:

Could we add a Padding struct for this? When bytes is more than type? Are there other cases in which this would happen for a field in a struct as well as this enum usage? This also wouldn't happen for type = u8 as I see it.

Just trying to reduce as many branches as I can for performance.

If we can do that at compile type that'd be great, I'm not sure we can though... Also you're right with u8, this is a simple case which can be specialized (and needs to, else rustc complains about shift overflow) here's some food for thought: https://github.com/wcampbell0x2a/deku/pull/2

wcampbell0x2a commented 2 years ago

Current benchmarking :chart_with_upwards_trend:

deku_read_byte          time:   [8.3902 ns 8.4387 ns 8.4959 ns]
                        change: [-17.300% -16.110% -14.983%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  9 (9.00%) high severe

deku_write_byte         time:   [63.329 ns 63.648 ns 64.012 ns]
                        change: [-7.6643% -5.4937% -3.9645%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low severe
  8 (8.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

deku_read_bits          time:   [500.82 ns 504.55 ns 509.19 ns]
                        change: [-0.9629% +1.7699% +4.4820%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) high mild
  14 (14.00%) high severe

deku_write_bits         time:   [95.921 ns 96.374 ns 96.879 ns]
                        change: [-11.428% -9.0292% -7.2123%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

deku_read_enum          time:   [14.375 ns 14.418 ns 14.466 ns]
                        change: [-21.776% -19.593% -17.363%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  7 (7.00%) high severe

deku_write_enum         time:   [101.08 ns 101.55 ns 102.10 ns]
                        change: [-10.996% -8.9539% -7.4107%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

deku_read_vec           time:   [777.98 ns 779.94 ns 782.09 ns]
                        change: [-13.747% -12.440% -10.565%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

deku_write_vec          time:   [4.2898 µs 4.3064 µs 4.3268 µs]
                        change: [-14.469% -12.650% -10.762%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  12 (12.00%) high severe

deku_read_vec_perf      time:   [684.70 ns 687.68 ns 690.87 ns]

deku_write_vec_perf     time:   [4.6496 µs 4.6687 µs 4.6894 µs]

wcampbell0x2a commented 2 years ago

One of the CI failures is from

warning: you are deriving `PartialEq` and can implement `Eq`
  --> src/ctx.rs:76:30
   |
76 | #[derive(Debug, Copy, Clone, PartialEq, Eq, Ord, PartialOrd)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`

77: pub enum Limit<T, Predicate: FnMut(&T) -> bool> {

which looks to be a duplicate of: https://github.com/rust-lang/rust-clippy/issues/9413

sharksforarms commented 2 years ago

Thanks! There was a couple more TODO i added in regards to code duplication that should be addressed before merging

wcampbell0x2a commented 2 years ago

Thanks! There was a couple more TODO i added in regards to code duplication that should be addressed before merging

I think I fixed those now.

sharksforarms commented 2 years ago

Thanks! Release pending :)

sharksforarms / deku

Specialize bytes #278