Writing into a dynamic container

nickbclifford commented 3 years ago

First of all, I just wanted to say that I really like scroll! The context system and type-based parsing and everything works perfectly for my use case.

However, there is one thing that I can't seem to figure out how to do - writing types into a container like a Vec where the full size of the type is unknown.

Consider this (admittedly contrived) example:

pub struct MyType;
impl TryIntoCtx<'_> for MyType {
    type Error = scroll::Error;

    fn try_into_ctx(self, into: &mut [u8], _: ()) -> Result<usize, Self::Error> {
        let offset = &mut 0;

        if /* something that isn't statically determined */
        {
            into.gwrite_with(1u16, scroll::LE)?;
        } else {
            into.gwrite_with(1u32, scroll::LE)?;
        }

        Ok(*offset)
    }
}

If I pre-allocated a let mut buf = [0u8; 4] to cover the u32 case, then I'd be wasting space on the u16 case.

What I would love to do is something like this:

let mut buf = vec![];
buf.gwrite(MyType, 0)?;

where the Vec<u8> only allocates as-needed.

Is this something that's possible, potentially with a new container trait?

m4b commented 3 years ago

Hello, glad you are liking it!

So for this usecase, I believe you can use the IOWrite trait in combination with a Cursor on your vec, the iowrite method has an example: https://docs.rs/scroll/0.10.2/scroll/trait.IOwrite.html#method.iowrite

use scroll::IOwrite;
use std::io::Cursor;

let mut bytes = vec![];
let mut cursor = Cursor::new(&mut bytes[..]);
cursor.iowrite(YourType).unwrap();

this should expand your container dynamically (well, Cursor does at least); let me know if that doesn't work for your needs, and I'm sure we can figure something out :)

nickbclifford commented 3 years ago

So this solution would require IntoCtx implementations, right? My code is currently based on TryIntoCtx, so it would be nice if there was some sort of compatibility there.

Also, I anticipate that the root types I'll be writing will be larger than 256 bytes, which the docs say will be an issue.

nickbclifford commented 3 years ago

Apologies for the repeated ping, but I'd like to clarify my usecases:

I need panic-free parsing, so I'm currently using TryIntoCtx as it includes fallibility
I don't know the total bytesize of my types ahead of time, which is why a simple gwrite might not work for me without allocating extremely large buffers
Here's an example of the large types that I'm using - that tables field is definitely larger than 256 bytes.

m4b commented 3 years ago

Do you know the number of types at all, or is it something null-terminated ?

here is something I've done in the past, reading out nelems elements based on some value I might parse out inside some binary data:

let mut container = Vec::<N>::with_capacity(nelems as usize);
for i in 0..nelems {
    container.push(from.gread_with(&mut offset, ctx).map_err(|_| {
        scroll::Error::Custom(format!("Could not read out {}th container element", i))
    })?);
}

nickbclifford commented 3 years ago

I've already got reading working; the issue is with buffers at write-time.

m4b commented 3 years ago

If you don't know the number of types at all, I'm not sure what that means, other than it being "null terminated" somehow, e.g., like a list of elements of size K with some sigil to demarcate their end (which you see a lot in some systems); i'd recommend a while loop in that case, pushing items onto a vec that you parse out, maybe something like:

loop {
  if data[offset] == SIGIL {
    return (StuffWithContainer, offset);
  }
  container.push(data.gread_with(&mut offset, ctx))?;
}

m4b commented 3 years ago

oh haha, gosh, just noticed you're writing. So in the case of writing you also don't know how many types you have?

A quick hack might be:

let mut scratch = [u8; size_of::<YourType>()];
let mut buffer = Vec::with_capacity(ntypes * sizeof_type);
for t in types {
  scratch.pwrite(0, t);
  buffer.extend_from_slice(&t);
}

though that is technically an extra copy into the buffer that isn't necessary

m4b commented 3 years ago

Actually i think you can even just do:

let mut buffer = Vec::with_capacity(ntypes * sizeof_type);
buffer.resize(ntypes*sizeof_type, 0);
let mut offset = 0;
for t in types {
  buffer.gwrite(t, &mut offset)?;
}

and this should not write past capacity and len of buffer.

m4b commented 3 years ago

(admittedly, I don't fully understand your usecase quite yet)

It almost sounds like you're writing a list of union types into some bag of bytes?

nickbclifford commented 3 years ago

No worries, I understand that this is a pretty complicated usecase :smile:

Essentially, that Tables type contains a lot of Vecs, which in turn contain types that don't have a statically known byte size. Specifically, there is an index that may be 2 bytes or 4 bytes long:

#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub struct Simple<T>(pub usize, PhantomData<T>);

impl<'a, T: 'a + HasKind> TryIntoCtx<Sizes<'a>> for Simple<T> {
    type Error = scroll::Error;

    fn try_into_ctx(self, into: &mut [u8], sizes: Sizes<'a>) -> Result<usize, Self::Error> {
        let offset = &mut 0;

        if *sizes.tables.get(&T::kind()).unwrap_or(&0) < (1 << 16) {
            into.gwrite_with(self.0 as u16, offset, scroll::LE)?;
        } else {
            into.gwrite_with(self.0 as u32, offset, scroll::LE)?;
        }

        Ok(*offset)
    }
}

I also have lots of types (in here if you're curious) that can have significant variation in bytesize depending on their specific values.

Unless I'm misunderstanding something, it would be impractical to find the exact bytesize to pre-allocate a Vec to use as a buffer, so what I'm asking for is something that could dynamically allocate along with gwrite so that I wouldn't be wasting space or guessing a buffer size.

m4b commented 3 years ago

Yea; this is tricky for sure; if i'm understanding (and i was going to ask), the list of values, in this case either u16s or u32s, get their actual binary encoded size from some other table basically (it's size context, as it were).

(I was going to ask if the size wasn't in the byte stream, e.g., just before the type, how you are able to read out values encoded this way).

So I think the tension is here:

if *sizes.tables.get(&T::kind()).unwrap_or(&0) < (1 << 16)

this tells us at TryIntoCtx "time" the size we need, and then the write varies.

However, the tension is that TryIntoCtx receives a &mut [u8] which can't vary in size, or be dynamically reallocated if there isn't enough space. The assumption is that the trait is given enough space to write into generally (otherwise it's an out of bounds error).

The way I see it, the only way to do this dynamically (and there may be some other obstruction making this hard/impossible that i don't see) is to hoist the if *sizes.tables.get(&T::kind()).unwrap_or(&0) < (1 << 16) out of the write (or keep it there too), and use it to dynamically size the container just before calling gwrite on this "dynamically sized type".

Alternatively, and I don't know how difficult it will be, but you may have some success implementing a "two pass" system where you first scan all values for their sizes, sum them, and then allocate; then size the buffer accordingly, then write into it as usual with gwrite(t, &mut offset). The perf hit (and I don't know the sizes of these things) may not be that bad, since what would have previously been > 1 malloc calls for dynamic resizing is now exactly 1.

Sorry if I'm not super helpful here, it's hard to say!

Lastly, and I'm not saying you should give up or that it can't be made to work, keep in mind scroll was primarily made for simpler usecases, mainly binary pods with length + types style records you'd find, so it can become uncomfortable when doing complex parsing :)

I'm also a bit sad the reference write PR never landed; if your types are particularly large, which I've dealt with when pwriting C types occsionally, they can cause stack overflows since the type is moved (memcopied) into the stack frame which if your ulimit isn't high enough for larger types can cause these issues (your > 256 byte type reminded me of this).

nickbclifford commented 3 years ago

I was thinking about this earlier and thought about possible changes that, while being an API break, would make this fairly simple: changing the destination type for TryIntoCtx from a &mut [u8] to just impl Pwrite, then implementing a dynamic container potentially along these lines:

pub struct GrowableContainer {
    buf: Vec<u8>,
    increment: usize
}

// PoC, ignoring supertraits and constraints for now
impl<Ctx, E> Pwrite<Ctx, E> for GrowableContainer {
    fn pwrite_with(&mut self, value: impl TryIntoCtx<Ctx, Error = E>, offset: usize, ctx: Ctx) -> Result<usize, E> {
        if buf.len() < offset {
            self.buf.extend(std::iter::repeat(0u8).take(self.increment));
        }

        value.try_into_ctx(&mut self, ctx)
    }
}

nickbclifford commented 3 years ago

Regardless, thanks for the help! I kind of had the feeling that I was abusing scroll a bit anyway, so I really appreciate your willingness to understand my project and help out!

m4b commented 3 years ago

At one point in the distant and foggy past there was support for custom containers, even non-byte based containers iirc; I or someone else removed because I don't think anyone used it, and it drastically simplified things. It also can cause issues with trait resolution, and iirc, forces the user to manually specify which implementation to dispatch to, which was/is a huge ergonomic hit and made the crate not very fun/useful to use imho.

It would be interesting to see your proposal more fleshed out, but be aware that if it's a large change:

I have limited time to review (but I would review :))
if it's a breaking change that makes it significantly less likely for it to merge (unless the benefits are substantial!)
Hacking on scroll at that level can be a bit unpleasant because it's hyper generic and very easy to make all sorts of things fail that are far away (most of the tests are just there to ensure various api level uses will compile, not even testing)), so I don't want to invite you into pain unless you understand what you'd be getting into haha

Anyway, I'll think about your issue a bit more; just glancing at your suggestion, I'm wondering if you could create a custom container and implement MeasureWith on it:

    fn pwrite_with<N: TryIntoCtx<Ctx, <Self as Index<RangeFrom<usize>>>::Output, Error = E>>(
        &mut self,
        n: N,
        offset: usize,
        ctx: Ctx,
    ) -> result::Result<usize, E> {
        let len = self.measure_with(&ctx);

this is a hook just before pwrite is called; it takes &self, but if you implemented e.g., interior mutability for your container type, I wonder if you could perform the dynamic allocation there (the ctx is passed to you).

I do implement MeasureWith<T: AsRef<u8>> though so you wouldn't be able to implement AsRef<u8> for your container.

(You also need to implement Index, IndexMut, etc.). Just a random crazy thought :)

m4b commented 3 years ago

Well I nerdsniped myself; my last suggestion with custom container won't work I think, for number of reasons, but I think you're right that the hooks have to be in a pwrite-ish method itself; it may be possible to add another pwrite method or perhaps even a new trait that could do this, but i'm not sure what it would look like; my brain is fuzzy for now :)

nickbclifford commented 3 years ago

Here's a toy implementation of something growable, let me know what you think:

use std::ops::{Index, IndexMut, RangeFrom};
use scroll::ctx::MeasureWith;
use scroll::Pwrite;

struct GrowableContainer {
    buffer: Vec<u8>,
    increment: usize
}

impl<T> MeasureWith<T> for GrowableContainer {
    fn measure_with(&self, _: &T) -> usize {
        usize::MAX
    }
}

impl Index<usize> for GrowableContainer {
    type Output = u8;

    fn index(&self, index: usize) -> &Self::Output {
        self.buffer.index(index)
    }
}
impl Index<RangeFrom<usize>> for GrowableContainer {
    type Output = [u8];

    fn index(&self, index: RangeFrom<usize>) -> &Self::Output {
        self.buffer.index(index)
    }
}
impl IndexMut<RangeFrom<usize>> for GrowableContainer {
    fn index_mut(&mut self, index: RangeFrom<usize>) -> &mut Self::Output {
        let mut diff = index.start as isize - self.buffer.len() as isize;
        while diff >= 0 {
            self.buffer.extend(std::iter::repeat(0u8).take(self.increment));
            diff -= self.increment as isize;
        }

        self.buffer.index_mut(index)
    }
}

fn main() {
    let mut buf = GrowableContainer { buffer: vec![], increment: 16 };

    let mut offset = 0;
    buf.gwrite_with(7u32, &mut offset, scroll::LE).unwrap();
    buf.gwrite_with(0x1234u32, &mut offset, scroll::LE).unwrap();

    println!("{:x?}", &buf.buffer[..offset]);
}

nickbclifford commented 3 years ago

I also wonder if you could remove the supertraits on the Pwrite definition and move the default implementations into the impl<Ctx: Copy, E: From<error::Error>, R: ?Sized + Index<usize> + IndexMut<RangeFrom<usize>> + MeasureWith<Ctx>> Pwrite<Ctx, E> for R block, which would allow for easier custom container implementations without being a full API break (I believe).

nickbclifford commented 3 years ago

I guess I nerd-sniped myself too, because I just wrote a full implementation :laughing:

I pushed my first pass into a fork, which includes the relaxed Pwrite definition. If you could take a quick look and let me know if this is something you'd like to see, I'd be happy to add docs and create a PR!

m4b commented 3 years ago

It looks like the CI is passing??? So on first blush this looks quite excellent but I don’t want to be too hasty.

it is a breaking change (the trait gets an associated type) but that’s fine if do decide to go ahead with it.

but the solution appears quite elegant (before I heap too much praise I’ll have to review and test more, their could be some hidden gotcha here). But otherwise this is quite cool!

m4b commented 3 years ago

My major worry is that trait dispatch could become ambiguous but it looks ok. I remember having issues with this in the long ago past but hard to remember

m4b / scroll

Writing into a dynamic container #78