tact1m4n3 / crsf-rs

A work in progress crsf protocol packet parser
MIT License
12 stars 8 forks source link

Unneccesarily recalculating the Crc table #16

Closed peterkrull closed 5 months ago

peterkrull commented 5 months ago

Currently the Crc<u8> lookup-table is being re-calculated every time we want to calculate a checksum. This is quite inefficient compared to keeping a stored copy of the table, since it will not change. The following numbers are for the time taken to run 100_000 Crc calculations on an stm32f405. The (runtime) approach is the one currently used. The one marked (static) uses a statically allocated table stored in flash, and (owned) uses a table owned by the parser, so it is stored in ram.

Benchmark 1 (runtime): 8.656397 s
Benchmark 2 (static): 0.294063  s 
Benchmark 3 (owned): 0.22212 s

So if my calculations are correct, that could be roughly 8.4% of all processing time being used just to recalculate the Crc table, assuming we are receiving packets at 1000 Hz.

The functions I am benchmarking are the following:

pub fn calculate_checksum_runtime(data: &[u8]) -> u8 {
    let crc8_alg: Crc<u8> = Crc::<u8>::new(&CRC_8_DVB_S2);
    crc8_alg.checksum(data)
}

// Use link_section to force static into ram
// #[link_section = ".data"]
static CRC8_ALG: Crc<u8> = Crc::<u8>::new(&CRC_8_DVB_S2);
pub fn calculate_checksum_static(data: &[u8]) -> u8 {
    CRC8_ALG.checksum(data)
}

pub struct OwnedCrc {
    pub crc: Crc<u8>,
}

impl OwnedCrc {
    pub fn new() -> Self {
        OwnedCrc {
            crc: Crc::<u8>::new(&CRC_8_DVB_S2),
        }
    }

    pub fn calculate_checksum_owned(&self, data: &[u8]) -> u8 {
        self.crc.checksum(data)
    }
}

I think having a pre-calculated table is preferable for the vast majority of use cases. Using an owned table is the simples, and guarantees the table is placed in RAM, whereas the static method should mean multiple instances of the parser will use the same table. Optionally we could tell the linker to put it in RAM. We could also use the Digest from Crc to be able to calculate the checksum on a per-byte basis as we process them in the parser, instead of doing one large 'burst' calculation at the end.

tact1m4n3 commented 5 months ago

Yeah. We should fix this. But I don't know how we should go about implementing the owned version(where should we store it). The static version seems to be much more straight forward to implement(we can just make the variable static).

tact1m4n3 commented 5 months ago

To use Digest, I think we should just move everything in the parse function to the PacketReader/PacketParser which is something I was thinking of doing anyway. But the problem with this is that we can't return a RawPacket anymore(without going through the overhead of parsing). I know @anti-social wanted this feature. What's your take on this?

anti-social commented 5 months ago

Nice catch. When I benchmarked parsing I also saw that calculating checksum overlaps all the other stuff.

peterkrull commented 5 months ago

The static solution is definitely the simplest, and is perfectly fine until we might come up with something better. Of course it means that the table is stored, even if a crsf parser is never constructed at runtime, but I would say it is good enough to do it like that for now.

anti-social commented 5 months ago

I don't think 256 bytes are real problem on modern MCUs. And for me it looks unlikely that a firmware with a crsf parser won't parse crsf packets.

anti-social commented 5 months ago

If it is a bottleneck for someone we can just store a reference to a crc instance into RawPacket to use it later. RawPacket already borrows parser's buffer.