tact1m4n3 / crsf-rs

A work in progress crsf protocol packet parser
MIT License
10 stars 7 forks source link

Faster parsing of RcChannels #15

Closed peterkrull closed 4 months ago

peterkrull commented 4 months ago

Hey! I was just in the process of doing my own crsf library when I saw that this already existed, so I thought I would try to contribute instead.

While the bitfield method of parsing the bytes for RcChannels is nice in that the bulk of the parsing is handled by the library, it also leaves a lot of performance on the table. I have the following parsing code, which is quite efficient, and can never panic, since all indexing can be verified by the compiler.

pub fn parse_rc_channels(slice: &[u8]) -> Option<RcChannels> {

    // Ensure fixed-size array to allow compiler to check
    // that all literal indexes are within bounds (cant panic)
    let data: [u8; 22] = slice.try_into().ok()?;

    // Convert u8 to u16 to make room for bit shifting
    let data: [u16; 22] = core::array::from_fn(|i| data[i] as u16);    

    // Initialize all channels to 0x07FF (11 high bits)
    let mut ch = [0x07FF; 16];

    ch[0] &= data[0] | data[1] << 8;
    ch[1] &= data[1] >> 3 | data[2] << 5;
    ch[2] &= data[2] >> 6 | data[3] << 2 | data[4] << 10;
    ch[3] &= data[4] >> 1 | data[5] << 7;
    ch[4] &= data[5] >> 4 | data[6] << 4;
    ch[5] &= data[6] >> 7 | data[7] << 1 | data[8] << 9;
    ch[6] &= data[8] >> 2 | data[9] << 6;
    ch[7] &= data[9] >> 5 | data[10] << 3;

    ch[8] &= data[11] | data[12] << 8;
    ch[9] &= data[12] >> 3 | data[13] << 5;
    ch[10] &= data[13] >> 6 | data[14] << 2 | data[15] << 10;
    ch[11] &= data[15] >> 1 | data[16] << 7;
    ch[12] &= data[16] >> 4 | data[17] << 4;
    ch[13] &= data[17] >> 7 | data[18] << 1 | data[19] << 9;
    ch[14] &= data[19] >> 2 | data[20] << 6;
    ch[15] &= data[20] >> 5 | data[21] << 3;

    Some(RcChannels(ch))
}

I tried running some benchmarks on an stm32f405, with the code above against two variants of the bitfield parser. One with RcChannelsPacked<&[u8]> so the parsing includes bounds checking and a panic path, and one with RcChannelsPacked<[u8; 22]> which cannot panic. These are the results for parsing 100_000 slices:

Benchmark 1: 0.175008 s (The one I propose)
Benchmark 2: 4.306767 s (RcChannelsPacked<&[u8]>)
Benchmark 3: 3.401365 s (RcChannelsPacked<[u8; 22]>)

I also ran a similar benchmark on a PC with criterion and the results are equally convincing. I have yet to implement the inverse "dump" method for this style, but it should be even less ugly than the parsing code. I will gladly make a PR if you agree to do it this way.

tact1m4n3 commented 4 months ago

Hey! Sounds great :))