sharksforarms / deku

Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization
Apache License 2.0
1.14k stars 55 forks source link

Reading a vector of arbitrary length that has an end of vector magic number #249

Closed ciarant closed 2 years ago

ciarant commented 2 years ago

Hi there. I'm trying to parse a legacy binary file structure, part of which looks a little like the code shown below. There's a Container that contains a vector of Things. There's no count of the number of Things. Instead there's a magic number (0xFE) to mark the end of the things vector.

Any suggestions on how best to handle this kind of scenario with deku?

Many thanks 🙇

let test_data: &[u8] = [
  0x01, // container.id
  0x04, 0x0A, 0x0B, // container.things[0]
  0x05, 0x0A, 0x0B, // container.things[1]
  0xFE // this marks the end of the 'things'
  0x01, 0x02 // container.checksum
].as_ref();

#[derive(Debug, PartialEq, DekuRead)]
struct Container {
    id: u8,
    #[deku(until = "|thing: &Thing| thing.id == 0xFE")]
    things: Vec<Thing>,
    checksum: u16,
}

#[derive(Debug, PartialEq, DekuRead)]
struct Thing {
    id: u8, // value of 0xFE here means this is not a Thing and the previous one was the last Thing
    value: u16, 
}
sharksforarms commented 2 years ago

Hey there! Thanks for asking this question. At the moment, there isn't an easy way to "peek" forward.

I would approach this by using your own "reader" (the escape hatch when there's no attribute which supports what you want to express)

Something like this maybe:

use deku::bitvec::*;
use deku::prelude::*;

#[derive(Debug, PartialEq, DekuRead)]
struct Container {
    id: u8,
    #[deku(reader = "read_things(deku::rest)")]
    things: Vec<Thing>,
    checksum: u16,
}

#[derive(Debug, PartialEq, DekuRead)]
struct Thing {
    id: u8, // value of 0xFE here means this is not a Thing and the previous one was the last Thing
    value: u16,
}

fn read_things(rest: &BitSlice<Msb0, u8>) -> Result<(&BitSlice<Msb0, u8>, Vec<Thing>), DekuError> {
    let mut things = Vec::new();
    let mut rest = rest;
    loop {
        let (next_rest, peek) = u8::read(rest, ())?;
        if peek == 0xFE {
            rest = next_rest;
            break;
        }
        let (next_rest, thing) = Thing::read(rest, ())?;
        things.push(thing);
        rest = next_rest;
    }
    Ok((rest, things))
}

fn main() {
    let test_data: &[u8] = [
        0x01, // container.id
        0x04, 0x0A, 0x0B, // container.things[0]
        0x05, 0x0A, 0x0B, // container.things[1]
        0xFE, // this marks the end of the 'things'
        0x01, 0x02, // container.checksum
    ]
    .as_ref();

    dbg!(Container::from_bytes((test_data, 0)));
}
ciarant commented 2 years ago

Wow! That worked a treat and, I think, it will help me out of a couple of other similar-ish situations.

Thank you so much (especially so on a Sunday).