sharksforarms / deku

Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization
Apache License 2.0
1.05k stars 54 forks source link

Add read_all attribute #387

Closed wcampbell0x2a closed 5 months ago

wcampbell0x2a commented 6 months ago

Add read_all attribute to read until the reader.end() returns true

Fix couple of doc bugs

Closes #230

github-actions[bot] commented 6 months ago

Benchmark for 2bdf2be

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | deku_read_bits | **1216.7±17.08ns** | 1283.8±16.95ns | **+5.51%** | | deku_read_byte | **20.8±0.23ns** | 21.1±0.19ns | **+1.44%** | | deku_read_enum | **9.4±0.13ns** | 9.6±0.17ns | **+2.13%** | | deku_read_vec | 58.1±0.69ns | 58.1±1.76ns | 0.00% | | deku_write_bits | 129.3±3.32ns | 128.0±1.79ns | -1.01% | | deku_write_byte | 132.5±6.87ns | **124.8±4.70ns** | **-5.81%** | | deku_write_enum | 92.6±5.69ns | 93.1±2.33ns | +0.54% | | deku_write_vec | 3.1±0.10µs | 3.1±0.05µs | 0.00% |
duskmoon314 commented 6 months ago

Looks great! Thanks for your work.

github-actions[bot] commented 6 months ago

Benchmark for 999f4fc

Click to view benchmark | Test | Base | PR | % | |------|--------------|------------------|---| | deku_read_bits | **1205.9±104.68ns** | 1265.7±15.93ns | **+4.96%** | | deku_read_byte | 21.6±1.32ns | **21.1±0.23ns** | **-2.31%** | | deku_read_enum | 9.5±0.23ns | 9.6±0.13ns | +1.05% | | deku_read_vec | 58.0±0.55ns | 57.9±0.30ns | -0.17% | | deku_write_bits | 132.7±3.80ns | **128.2±1.73ns** | **-3.39%** | | deku_write_byte | **136.6±7.13ns** | 145.2±2.48ns | **+6.30%** | | deku_write_enum | **94.8±3.78ns** | 105.3±5.29ns | **+11.08%** | | deku_write_vec | **3.1±0.04µs** | 4.4±0.07µs | **+41.94%** |
passcod commented 6 months ago

Sometimes it's useful to have both read_all and until. Here's a workaround I made for this purpose, based on this branch:

#[derive(Debug)]
struct VecTilEnd<T>(Vec<T>);
impl<'a, T, Ctx, Predicate> DekuReader<'a, (deku::ctx::Limit<T, Predicate>, Ctx)> for VecTilEnd<T>
where
    T: DekuReader<'a, Ctx>,
    Ctx: Copy,
    Predicate: FnMut(&T) -> bool,
{
    fn from_reader_with_ctx<R: std::io::Read>(
        reader: &mut deku::reader::Reader<R>,
        (limit, inner_ctx): (deku::ctx::Limit<T, Predicate>, Ctx),
    ) -> Result<Self, DekuError>
    where
        Self: Sized,
    {
        let deku::ctx::Limit::Until(mut predicate, _) = limit else {
            return Err(DekuError::Parse("`until` is required here".to_string()));
        };

        let mut res = Vec::new();
        let start_read = reader.bits_read;

        loop {
            if reader.end() {
                break;
            }
            let val = <T>::from_reader_with_ctx(reader, inner_ctx)?;
            res.push(val);
            if predicate(res.last().unwrap()) {
                break;
            }
        }

        Ok(Self(res))
    }
}

#[derive(Debug, DekuRead)]
struct MaybeProperty {
    #[deku(until = "|b: &u8| *b != b'\\n'")]
    ws: VecTilEnd<u8>,

    #[deku(cond = "ws.0.last().map_or(false, |c| c != &b'[')")]
    property: Option<Property>,
}

(For this very silly thing: https://gist.github.com/passcod/9148782657f3f40df75ff85b8dd77c2a)

sharksforarms commented 5 months ago

@passcod could you clarify your use case with some examples? Trying to understand what you're trying to achieve. I attempted something here, but I didn't need VecTilEnd and could use a Vec to get the same result. I suspect I'm missing something with the "til end" portion.

#[derive(Debug, PartialEq, Eq, DekuRead)]
struct MaybePropertyVecTilEnd {
    #[deku(until = "|b: &u8| *b != b'\\n'")]
    ws: VecTilEnd<u8>,

    field: u8,
}

#[derive(Debug, PartialEq, Eq, DekuRead)]
struct MaybePropertyVec {
    #[deku(until = "|b: &u8| *b != b'\\n'")]
    ws: Vec<u8>,

    field: u8,
}

fn main() {
    {
        let mut input = Cursor::new(b"\n\n\n[a");
        let (_, prop) = MaybePropertyVecTilEnd::from_reader((&mut input, 0)).unwrap();
        assert_eq!(
            prop,
            MaybePropertyVecTilEnd {
                ws: VecTilEnd(b"\n\n\n[".to_vec()),
                field: 0x61,
            }
        );
    }

    {
        let mut input = Cursor::new(b"\n\n\n[a");
        let (_b, prop) = MaybePropertyVec::from_reader((&mut input, 0)).unwrap();
        assert_eq!(
            prop,
            MaybePropertyVec {
                ws: b"\n\n\n[".to_vec(), // same as VecTilEnd?
                field: 0x61,
            }
        );
    }
}
passcod commented 5 months ago

I've added the input file to the gist.

The way I reasoned about it is that I needed to read ws (which is a few layers deep in the structure) "until either a match or EOF".


[Section]
Key=value

For example this is:

"\n[Section]\nKey=value\n"

and so I read (in the full example in gist):

Unit::head_ws:      (read until '['):       \n[
Section::name:      (read until ']'):       Section]
MaybeProperty::ws:  (read until not '\n'):  \nK
Property::key:      (read until '='):       ey=
Property::value:    (read until '\n'):      value\n
MaybeProperty::ws:  (read until not '\n'):  !!! need one more byte !!!
(error!)

With VecTilEnd it continues instead:

MaybeProperty::ws:      (read until not '\n' or EOF):  *EOF* -> empty vec
MaybeProperty::property (not read because ws is empty -> None)
Section::properties     (closes out because MaybeProperty::property is None)
Unit::sections          (closes out because Section is done)
(successful parse!)

Now, deku isn't really adapted for this particular usecase, but it does seem like "read until either EOF or a delimiter" would be useful in general.

sempervictus commented 5 months ago

handy, thank you - works for me.

sharksforarms commented 5 months ago

@wcampbell0x2a what do you think of the above?

Could be neat to have an indicator of "eof" which could be used in attributes such as:

#[deku(until = "|v: &u8| *v == 0 || deku::eof")]
data: Vec<u8>

I'm ok with adding a read_all attribute to facilitate this use case, but thinking about others such as the one mentioned by @\passcod

passcod commented 5 months ago

I was thinking about this and I'm not sure (out of ignorance; haven't looked) if it would work with deku internals, but something like

until_next = |b: Option<T>| { ... }

that would 'peek' at the next byte/bit/item (where None is EOF or 'no more data available') and stop at the current if it returns true. That could cover both the EOF case and some part of #100.

wcampbell0x2a commented 5 months ago

@wcampbell0x2a what do you think of the above?

Could be neat to have an indicator of "eof" which could be used in attributes such as:

#[deku(until = "|v: &u8| *v == 0 || deku::eof")]
data: Vec<u8>

I'm ok with adding a read_all attribute to facilitate this use case, but thinking about others such as the one mentioned by @\passcod

How would this convert into a Limit(_) ?

wcampbell0x2a commented 5 months ago

@sharksforarms I added a Add-To-Changelog label, which is something I usually use to keep track of issues/MRs that are merged that don't have a changelog entry before releasing. I you don't need/use them lmk