hecatia-elegua / bilge

Use bitsized types as if they were a feature of rust.
Apache License 2.0
173 stars 18 forks source link

Ability to read from a bit-oriented stream (such as bitvec?) #73

Closed kitlith closed 1 year ago

kitlith commented 1 year ago

I have a reverse engineered structure that I want to read that starts with two bits indicating the mode, then differently sized structures depending on that mode, then read a fixed structure (whose location depends on the mode, because it comes immediately afterward.)

In the worst case for my structure, it is 136 bits (out of a possible total of 144) before the fields in the struct re-align with a byte boundary, and the writers of this library seems to be very dubious about the use of bitfields that are greater than 128 bits in size, so I figure that this could be a possible compromise?

(For those who suggest alternatives: I tried deku, I really wanted it to work, but it doesn't support the bitorder i need. I'm thinking of either using modular-bitfield and mourning the loss of my enum, or trying to fix deku to do what I need. Probably the former.)

hecatia-elegua commented 1 year ago

First of all, that's an interesting idea, though would we want to integrate this via a feature gate? Your title sounds like it could be solved with https://github.com/hecatia-elegua/bilge#custom--bits-derives, but the comments sounds like you want a struct >128bits, so with another underlying/base type.

I'm dubious about using arrays as a base type, since it complicates the code. If you could share the layout, or... maybe a slight variation of it, without names if you want, that would be useful.

kitlith commented 1 year ago

I'm saying I can avoid having bitfields that are greater than 128 bits if I can read them from an unaligned bitstream, in sequence. Otherwise, there is one particular case where the field offset doesn't align with a byte (so I can split the bit field) until 136 bits out, and otherwise it gets a bit weird since the variations are two different sizes. (hm, I suppose the other workaround would be to have overlapping bit fields so to speak, where the last few bits of one and the first few bits of the other are ignored.)

Layout is mostly described in terms of deku here: https://gist.github.com/kitlith/24f4b5b2fed6cb98ef8f1daf382f6495#file-main-rs-L207

It is, however, only describing the first 125-127 bits at the moment, and is missing an additional 11 bit field and 6 bit field at the end, (correction: the QuatOutput structure just above includes the relevant bits) bringing the total to 142-144 bits. If I could read from an unaligned bitstream, that could be expressed in terms of reading a 2 bit mode, reading 123 or 125 bits into 1 of 3 structures, then reading the 17 bit structure dealing with timestamps.

I'll take a look at custom derives, though now that I'm thinking about this a bit more, I could probably just write a generic function that uses the Bitsized trait.

kitlith commented 1 year ago

yep, generic function seems to work:

use bilge::prelude::*;
use bitvec::prelude::*;
use bitvec::macros::internal::funty::Integral;

pub trait BilgeBitvecExt: Bitsized + Sized {
    fn read<S: BitStore, B: BitOrder>(data: &BitSlice<S, B>) -> (Self, &BitSlice<S, B>)
    where
        BitSlice<S, B>: BitField,
        Self: From<<Self as Bitsized>::ArbitraryInt>,
        <Self as Bitsized>::ArbitraryInt: Number,
        <Self::ArbitraryInt as Number>::UnderlyingType: Integral,
    {
        assert!(data.len() >= Self::BITS);
        let data = &data[..Self::BITS];
        let underlying = BitField::load_le(data);
        let storage = Self::ArbitraryInt::new(underlying);
        (storage.into(), &data[Self::BITS..])
    }

    fn write_vec<S: BitStore, B: BitOrder>(self, data: &mut BitVec<S, B>)
    where
        BitVec<S, B>: BitField,
        Self::ArbitraryInt: Number + From<Self>,
        <Self::ArbitraryInt as Number>::UnderlyingType: Integral + BitView,
    {
        let mut bits = bitvec!(S, B; 0; Self::BITS);
        bits.store_le(<Self::ArbitraryInt>::from(self).value());
        data.extend_from_bitslice(&bits);
    }

    fn write_slice<S: BitStore, B: BitOrder>(self, data: &mut BitSlice<S, B>) -> &mut BitSlice<S, B>
    where
        BitSlice<S, B>: BitField,
        Self::ArbitraryInt: Number + From<Self>,
        <Self::ArbitraryInt as Number>::UnderlyingType: Integral + BitView,
    {
        assert!(data.len() >= Self::BITS);
        let bits = &mut data[..Self::BITS];
        bits.store_le(<Self::ArbitraryInt>::from(self).value());
        &mut data[Self::BITS..]
    }
}

impl<T: Bitsized + Sized> BilgeBitvecExt for T {}

#[bitsize(15)]
#[derive(FromBits, Clone, DebugBits, PartialEq)]
struct Test {
    a: u5,
    b: u5,
    c: u5
}

fn main() {
    let test = Test::new(u5::new(0b00001), u5::new(0b00011), u5::new(0b00111));
    let mut sink = BitVec::<u8, Lsb0>::new();
    test.clone().write_vec(&mut sink);
    sink.as_raw_slice().iter().for_each(|e| println!("0b{:08b}", e));
    assert_eq!(sink.as_raw_slice(), &[0b01100001, 0b00011100]);
    let res = Test::read(&sink).0;
    assert_eq!(res, test);
}

(EDIT: moved the generic functions to an extension trait) (EDIT: turned write into write_vec, added write_slice)

kitlith commented 1 year ago

I should probably note that the ideas behind the above extension trait probably only work correctly for two cases: Msb0 with big endian byte order, and Lsb0 with little endian byte order. Technically, it is probably also possible to have Msb0 with little endian byte order and Lsb0 with big endian byte order, and in this case the generic function will serialize everything into an integer, then split it up along byte boundaries, whereas you might have intended it to split along byte boundaries on a per-field basis. that would require a derive macro, as you'd need to write each field individually, i think.