jam1garner / binrw

A Rust crate for helping parse and rebuild binary data using ✨macro magic✨.
https://binrw.rs
MIT License
585 stars 35 forks source link

Consider expanding the checksum examples to work with ```#[derive(PartialEq, Debug)]``` #243

Closed hak8or closed 9 months ago

hak8or commented 9 months ago

I come here from the land of Serde, not realizing Serde doesn't seem to work well at all (from my very newbie Rust understanding) for structs being being SERDES'd into actual raw binary data. This library looks fantastic, especially being able to integrate custom logic in the serialization and deserialization (checksums, versions, magic bytes, etc), but I am getting lost with an error as I am expanding the checksum logic.

I have this minimal example;=, which as-is compiles fine;

#[binwrite]
#[binread]
#[brw(little, magic = b"Foo")]
#[br(stream = r, map_stream = csum_wrapper_r)]
#[bw(stream = w, map_stream = csum_wrapper_w)]
pub struct Foo {
    member0: u8,
    member1: u8,

    #[derivative(PartialEq="ignore")]
    #[bw(calc = 1)]
    #[br(temp, assert(csum == 1, "bad checksum, got:{}", csum))]
    csum : u8
}

#[cfg(test)]
mod tests {
    use binrw::{binread, BinRead, BinWrite, binwrite};
    use std::io::Cursor;
    use super::Foo;

#[test]
    fn header_serdes() {
        let f = Foo {
            member0: 1,
            member1: 2
        };
        let mut cursor = Cursor::new(Vec::new());
        f.write(&mut cursor).unwrap();

        cursor.set_position(0);

        let f_readback = Foo::read(&mut cursor).expect("aa");
        // assert_eq!(f, f_readback);
    }
}

Now lets try to un-comment the assert_eq line in the test, but turns out we need the comparison and display traits, cool, so lets add them, giving us this;

#[derive(PartialEq, Debug)]
#[binwrite]
#[binread]
#[brw(little, magic = b"Foo")]
#[br(stream = r, map_stream = csum_wrapper_r)]
#[bw(stream = w, map_stream = csum_wrapper_w)]
pub struct Foo {
    member0: u8,
    member1:u8,

    #[derivative(PartialEq="ignore")]
    #[bw(calc = 1)]
    #[br(temp, assert(csum == 1, "bad checksum, got:{}", csum))]
    csum : u8
}

#[cfg(test)]
mod tests {
    use binrw::{binread, BinRead, BinWrite, binwrite};
    use std::io::Cursor;
    use super::Foo;

#[test]
    fn header_serdes() {
        let f = Foo {
            member0: 1,
            member1: 2
        };
        let mut cursor = Cursor::new(Vec::new());
        f.write(&mut cursor).unwrap();

        cursor.set_position(0);

        let f_readback = Foo::read(&mut cursor).expect("aa");
        assert_eq!(f, f_readback);
    }
}

which now sadly gives me the following error;

error[E0609]: no field `csum` on type `&Foo`
  --> Store/src/simple_demo.rs:37:5
   |
37 |     csum : u8
   |     ^^^^ unknown field
   |
   = note: available fields are: `member0`, `member1`

So I assume this is because the partialeq and debug derivations interact with #br(temp, ...) in an unexpected order. I tried "hiding" the csum member of the struct using the https://crates.io/crates/derivative crate, but sadly no dice.

Am I just being obtuse and missing something obvious from the docs, or this related to issue #88? Is there a workaround which can be used beyond having to manually create the impl's needed for PartialEq and Debug? I am worried needing to manually impl them as that means all other potential derives would also need to be hand rolled (for example if I want to use it with some json or similar crates.

kitlith commented 9 months ago

This sounds like a macro evaluation order issue. Ret moving the #[derive(...)] underneath the binrw directives (which, when using both, can be replaced with #[binrw])

If I'm correct, what's happening is that the derive macros are running first with the original struct definition, including the temporary fields. Then the binrw attribute macros run, which remove the temporary fields. This leaves the derive output referencing fields that no longer exist.

By moving the binrw attribute macro(s) before the derive, you make that the outermost macro which runs first, which lets the derive macros run after the temporary field has been removed.

hak8or commented 9 months ago

Thank you so much for your help, I am still getting used to macro evaluation in rust so that explanation is exactly what I needed in the end.

To help others in the future and if they come here via google or similar, here is a full minimal example based on the map stream example from the fantastic docs of this project, the above explanation, and some further tinkering I did. This is what I am basing my other structs of in my project, which has a magic sequence of bytes at the start, then two simple variables, and a checksum at the end.

First up is the use's and something which can be used for the map_stream attribute. It handles hashing the contents while also ensuring we don't attempt to use the checksum itself to read the checksum.

use std::hash::Hasher;
use std::io::{Read, Seek, SeekFrom, Write};
use binrw::binrw;
use xxhash_rust::xxh3::Xxh3;
use crate::meta_types::ChunkHash;

struct Checksumer<T> {
    inner: T,
    hash: Xxh3,
    current_position: usize
}
impl<T> Checksumer<T> {
    const HARDCODE_MAGIC: [u8; 3] = *b"Foo";
    const HARDCODE_HASH_OFFSET: usize = Self::HARDCODE_MAGIC.len() + 2;
}

impl<T: Seek> Checksumer<T> {
    fn new(inner: T) -> Self {
        let mut h_tmp = Xxh3::new();
        h_tmp.write(&Self::HARDCODE_MAGIC);
        Self {
            inner,
            hash: h_tmp,
            current_position: 0
        }
    }

    fn finalize_hash(&self) -> ChunkHash {
        ChunkHash::from_u128(self.hash.digest128())
    }
}

impl<T: Read> Read for Checksumer<T> {
    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
        let rc = self.inner.read(buf);
        if self.current_position < Self::HARDCODE_HASH_OFFSET {
            self.hash.write(buf);
        }
        self.current_position += buf.len();
        rc
    }
}

impl<T: Seek> Seek for Checksumer<T> {
    fn seek(&mut self, pos: SeekFrom) -> std::io::Result<u64> {
        let rc = self.inner.seek(pos)?;
        self.current_position = self.inner.stream_position()? as usize;
        Ok(rc)
    }
}

impl<T: Write> Write for Checksumer<T> {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        self.hash.write(buf);
        self.current_position += buf.len();
        self.inner.write(buf)
    }

    fn flush(&mut self) -> std::io::Result<()> { self.inner.flush()}
}

Next up is the struct itself we care about.

#[derive(Copy, Clone, PartialEq, BinRead, BinWrite)]
pub struct ChunkHash {
    h: [u8; 16]
}
impl ChunkHash {
    pub fn from_u128(val: u128) -> Self {
        Self { h: val.to_be_bytes() }
    }
    pub fn to_u128(self) -> u128 {
        u128::from_be_bytes(self.h)
    }
}
impl fmt::Display for ChunkHash {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{:#016x?}", self.to_u128())
    }
}
impl fmt::Debug for ChunkHash {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{:#016x?}", self.to_u128())
    }
}

#[binrw]
#[derive(PartialEq, Debug)]
#[brw(
    little, 
    magic = b"Foo",
    stream = r, map_stream = Checksumer::new
)]
pub struct Foo {
    member0: u8,
    member1: u8,

    #[br(
        temp,
        assert(
            checksum == r.finalize_hash(),
            "bad checksum: read:{:x?} != calculated:{:x?}", checksum, r.finalize_hash()
        )
    )]
    #[bw(calc(r.finalize_hash()))]
    checksum : ChunkHash
}

And lastly, a few tests to confirm the above works.

#[cfg(test)]
mod tests {
    use binrw::{BinRead, BinWrite};
    use xxhash_rust::const_xxh3::xxh3_128;
    use std::io::{Cursor, Seek};
    use super::Foo;

    #[test]
    fn raw_compare(){
        let v_raw_refrence = [
            b"F"[0], b"o"[0], b"o"[0],
            0xDE,
            0xAD,
            0x0B, 0x2B, 0xC3, 0x40, 0xFE, 0x30, 0x3E, 0x74,
            0x98, 0xE2, 0x61, 0x40, 0x93, 0xA1, 0xA7, 0xFB
        ];
        assert_eq!(v_raw_refrence.len(), 3 + 1 + 1 + 16);
        assert_eq!(v_raw_refrence[..3], [0x46, 0x6F, 0x6F]);
        assert_eq!(v_raw_refrence[..5], [0x46, 0x6F, 0x6F, 0xDE, 0xAD]);

        // Check the hashes match
        assert_eq!(v_raw_refrence[5..], [
            0x0B, 0x2B, 0xC3, 0x40, 0xFE, 0x30, 0x3E, 0x74, 
            0x98, 0xE2, 0x61, 0x40, 0x93, 0xA1, 0xA7, 0xFB
        ]);
        assert_eq!(xxh3_128(&v_raw_refrence[..5]).to_be_bytes(), v_raw_refrence[5..]);

        // Check the serialization produces the expected binary blob, including
        // the checksum!
        let f_raw = Foo {
            member0: 0xDE,
            member1: 0xAD
        };
        let mut cursor = Cursor::new(Vec::new());
        f_raw.write(&mut cursor).unwrap();
        assert_eq!(cursor.clone().into_inner(), v_raw_refrence);

        // Deserialize the data and verify it matches.
        cursor.seek(std::io::SeekFrom::Start(0)).unwrap();
        let f_raw_readback = Foo::read(&mut cursor);
        assert_eq!(f_raw_readback.unwrap(), f_raw);
    }
}

All of the snippets above are released in public domain.


I remember seeing in other github issues you saying that you intentionally had some of the documents written to be more of a guide than something which can be copy pasted, so I posted what I found was needed to "fill in the gaps" in the map_stream example of the docs. If you like, I can issue a pull request to the docs, based on the snippet above. If you would prefer to avoid doing that, please feel free to close this issue as resolved.

Thank you again for your help, writing this library, and maintaining this library to this day.