Some thoughts on the improvement of performance within deku. read has been changed to just return the amount of bytes read. So far I only implemented the new read for u8 for some initial testing. I haven't messed around with benchmarking or implementing for other types, just wanted to see if anyone had objections before I put some time into that (when I get more time... someday ;) )
I'm assuming this could be done for write primitives also, but haven't tried that.
/// "Reader" trait: read bits and construct type
pub trait NewDekuRead<'a, Ctx = ()> {
/// Read bits and construct type
/// * **input** - Input as bits
/// * **ctx** - A context required by context-sensitive reading. A unit type `()` means no context
/// needed.
///
/// Returns the remaining bits after parsing in addition to Self.
fn read_new(
input: &'a BitSlice<u8, Msb0>,
ctx: Ctx,
) -> Result<(usize, Self), DekuError>
where
Self: Sized;
}
/// "Reader" trait: implemented on DekuRead struct and enum containers. A `container` is a type which
/// doesn't need any context information.
pub trait NewDekuContainerRead<'a>: NewDekuRead<'a, ()> {
/// Read bytes and construct type
/// * **input** - Input given as data and bit offset
///
/// Returns the remaining bytes and bit offset after parsing in addition to Self.
fn from_bytes(input: (&'a [u8], usize)) -> Result<((usize, usize), Self), DekuError>
where
Self: Sized;
}
// specialize u8 for ByteSize
impl NewDekuRead<'_, ()> for u8 {
fn read_new(
input: &BitSlice<u8, Msb0>,
_: (),
) -> Result<(usize, Self), DekuError> {
const MAX_TYPE_BITS: usize = BitSize::of::<u8>().0;
if input.len() < MAX_TYPE_BITS {
return Err(DekuError::Incomplete(NeedSize::new(
MAX_TYPE_BITS,
)));
}
// PANIC: We already check that input.len() < bit_size above, so no panic will happen
let value = input[..MAX_TYPE_BITS].load::<u8>();
Ok((MAX_TYPE_BITS, value))
}
}
The generated code:
#[derive(Debug, DekuRead)]
pub struct Test {
a: u8,
b: u8,
}
impl NewDekuRead<'_, ()> for Test {
fn read_new(
__deku_input_bits: &'_ ::deku::bitvec::BitSlice<u8, ::deku::bitvec::Msb0>,
_: (),
) -> Result<(usize, Self), ::deku::DekuError>
{
use core::convert::TryFrom;
let mut __deku_read = 0;
let mut __deku_rest = __deku_input_bits;
let __deku_a = {
//assert!(__deku_read < __deku_input_bits.len());
let (__deku_read, __deku_value) =
<u8 as NewDekuRead<'_, _>>::read_new(&__deku_rest, ())?;
let __deku_value: u8 = Result::<_, ::deku::DekuError>::Ok(__deku_value)?;
__deku_rest = &__deku_rest[__deku_read..];
__deku_value
};
let __deku_b = {
//assert!(__deku_read < __deku_input_bits.len());
let (__deku_read_new, __deku_value) =
<u8 as NewDekuRead<'_, _>>::read_new(&__deku_rest, ())?;
let __deku_value: u8 = Result::<_, ::deku::DekuError>::Ok(__deku_value)?;
__deku_rest = &__deku_rest[__deku_read..];
__deku_value
};
let __deku_value = Self {
a: __deku_a,
b: __deku_b,
};
Ok((__deku_read, __deku_value))
}
}
The benchmarks look pretty good, atleast for u8:
deku read new time: [10.862 ns 10.994 ns 11.150 ns]
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
deku time: [23.186 ns 23.337 ns 23.510 ns]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
manual time: [4.5398 ns 4.5656 ns 4.6002 ns]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
Some thoughts on the improvement of performance within
deku
.read
has been changed to just return the amount of bytes read. So far I only implemented the newread
for u8 for some initial testing. I haven't messed around with benchmarking or implementing for other types, just wanted to see if anyone had objections before I put some time into that (when I get more time... someday ;) )I'm assuming this could be done for
write
primitives also, but haven't tried that.The generated code:
The benchmarks look pretty good, atleast for
u8
:I pushed the benchmark here: https://github.com/wcampbell0x2a/deku-perf