Closed nickbclifford closed 2 years ago
Hello, glad you are liking it!
So for this usecase, I believe you can use the IOWrite trait in combination with a Cursor on your vec, the iowrite method has an example: https://docs.rs/scroll/0.10.2/scroll/trait.IOwrite.html#method.iowrite
use scroll::IOwrite;
use std::io::Cursor;
let mut bytes = vec![];
let mut cursor = Cursor::new(&mut bytes[..]);
cursor.iowrite(YourType).unwrap();
this should expand your container dynamically (well, Cursor does at least); let me know if that doesn't work for your needs, and I'm sure we can figure something out :)
So this solution would require IntoCtx
implementations, right? My code is currently based on TryIntoCtx
, so it would be nice if there was some sort of compatibility there.
Also, I anticipate that the root types I'll be writing will be larger than 256 bytes, which the docs say will be an issue.
Apologies for the repeated ping, but I'd like to clarify my usecases:
TryIntoCtx
as it includes fallibilitygwrite
might not work for me without allocating extremely large bufferstables
field is definitely larger than 256 bytes.Do you know the number of types at all, or is it something null-terminated ?
here is something I've done in the past, reading out nelems
elements based on some value I might parse out inside some binary data:
let mut container = Vec::<N>::with_capacity(nelems as usize);
for i in 0..nelems {
container.push(from.gread_with(&mut offset, ctx).map_err(|_| {
scroll::Error::Custom(format!("Could not read out {}th container element", i))
})?);
}
I've already got reading working; the issue is with buffers at write-time.
If you don't know the number of types at all, I'm not sure what that means, other than it being "null terminated" somehow, e.g., like a list of elements of size K with some sigil to demarcate their end (which you see a lot in some systems); i'd recommend a while loop in that case, pushing items onto a vec that you parse out, maybe something like:
loop {
if data[offset] == SIGIL {
return (StuffWithContainer, offset);
}
container.push(data.gread_with(&mut offset, ctx))?;
}
oh haha, gosh, just noticed you're writing. So in the case of writing you also don't know how many types you have?
A quick hack might be:
let mut scratch = [u8; size_of::<YourType>()];
let mut buffer = Vec::with_capacity(ntypes * sizeof_type);
for t in types {
scratch.pwrite(0, t);
buffer.extend_from_slice(&t);
}
though that is technically an extra copy into the buffer
that isn't necessary
Actually i think you can even just do:
let mut buffer = Vec::with_capacity(ntypes * sizeof_type);
buffer.resize(ntypes*sizeof_type, 0);
let mut offset = 0;
for t in types {
buffer.gwrite(t, &mut offset)?;
}
and this should not write past capacity and len of buffer.
(admittedly, I don't fully understand your usecase quite yet)
It almost sounds like you're writing a list of union types into some bag of bytes?
No worries, I understand that this is a pretty complicated usecase :smile:
Essentially, that Tables
type contains a lot of Vec
s, which in turn contain types that don't have a statically known byte size. Specifically, there is an index that may be 2 bytes or 4 bytes long:
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub struct Simple<T>(pub usize, PhantomData<T>);
impl<'a, T: 'a + HasKind> TryIntoCtx<Sizes<'a>> for Simple<T> {
type Error = scroll::Error;
fn try_into_ctx(self, into: &mut [u8], sizes: Sizes<'a>) -> Result<usize, Self::Error> {
let offset = &mut 0;
if *sizes.tables.get(&T::kind()).unwrap_or(&0) < (1 << 16) {
into.gwrite_with(self.0 as u16, offset, scroll::LE)?;
} else {
into.gwrite_with(self.0 as u32, offset, scroll::LE)?;
}
Ok(*offset)
}
}
I also have lots of types (in here if you're curious) that can have significant variation in bytesize depending on their specific values.
Unless I'm misunderstanding something, it would be impractical to find the exact bytesize to pre-allocate a Vec
to use as a buffer, so what I'm asking for is something that could dynamically allocate along with gwrite
so that I wouldn't be wasting space or guessing a buffer size.
Yea; this is tricky for sure; if i'm understanding (and i was going to ask), the list of values, in this case either u16s or u32s, get their actual binary encoded size from some other table basically (it's size context, as it were).
(I was going to ask if the size wasn't in the byte stream, e.g., just before the type, how you are able to read out values encoded this way).
So I think the tension is here:
if *sizes.tables.get(&T::kind()).unwrap_or(&0) < (1 << 16)
this tells us at TryIntoCtx
"time" the size we need, and then the write varies.
However, the tension is that TryIntoCtx
receives a &mut [u8]
which can't vary in size, or be dynamically reallocated if there isn't enough space. The assumption is that the trait is given enough space to write into generally (otherwise it's an out of bounds error).
The way I see it, the only way to do this dynamically (and there may be some other obstruction making this hard/impossible that i don't see) is to hoist the if *sizes.tables.get(&T::kind()).unwrap_or(&0) < (1 << 16)
out of the write (or keep it there too), and use it to dynamically size the container just before calling gwrite on this "dynamically sized type".
Alternatively, and I don't know how difficult it will be, but you may have some success implementing a "two pass" system where you first scan all values for their sizes, sum them, and then allocate; then size the buffer accordingly, then write into it as usual with gwrite(t, &mut offset)
. The perf hit (and I don't know the sizes of these things) may not be that bad, since what would have previously been > 1 malloc calls for dynamic resizing is now exactly 1.
Sorry if I'm not super helpful here, it's hard to say!
Lastly, and I'm not saying you should give up or that it can't be made to work, keep in mind scroll was primarily made for simpler usecases, mainly binary pods with length + types style records you'd find, so it can become uncomfortable when doing complex parsing :)
I'm also a bit sad the reference write PR never landed; if your types are particularly large, which I've dealt with when pwriting C types occsionally, they can cause stack overflows since the type is moved (memcopied) into the stack frame which if your ulimit
isn't high enough for larger types can cause these issues (your > 256 byte type reminded me of this).
I was thinking about this earlier and thought about possible changes that, while being an API break, would make this fairly simple: changing the destination type for TryIntoCtx
from a &mut [u8]
to just impl Pwrite
, then implementing a dynamic container potentially along these lines:
pub struct GrowableContainer {
buf: Vec<u8>,
increment: usize
}
// PoC, ignoring supertraits and constraints for now
impl<Ctx, E> Pwrite<Ctx, E> for GrowableContainer {
fn pwrite_with(&mut self, value: impl TryIntoCtx<Ctx, Error = E>, offset: usize, ctx: Ctx) -> Result<usize, E> {
if buf.len() < offset {
self.buf.extend(std::iter::repeat(0u8).take(self.increment));
}
value.try_into_ctx(&mut self, ctx)
}
}
Regardless, thanks for the help! I kind of had the feeling that I was abusing scroll
a bit anyway, so I really appreciate your willingness to understand my project and help out!
At one point in the distant and foggy past there was support for custom containers, even non-byte based containers iirc; I or someone else removed because I don't think anyone used it, and it drastically simplified things. It also can cause issues with trait resolution, and iirc, forces the user to manually specify which implementation to dispatch to, which was/is a huge ergonomic hit and made the crate not very fun/useful to use imho.
It would be interesting to see your proposal more fleshed out, but be aware that if it's a large change:
Anyway, I'll think about your issue a bit more; just glancing at your suggestion, I'm wondering if you could create a custom container and implement MeasureWith on it:
fn pwrite_with<N: TryIntoCtx<Ctx, <Self as Index<RangeFrom<usize>>>::Output, Error = E>>(
&mut self,
n: N,
offset: usize,
ctx: Ctx,
) -> result::Result<usize, E> {
let len = self.measure_with(&ctx);
this is a hook just before pwrite
is called; it takes &self
, but if you implemented e.g., interior mutability for your container type, I wonder if you could perform the dynamic allocation there (the ctx is passed to you).
I do implement MeasureWith<T: AsRef<u8>>
though so you wouldn't be able to implement AsRef<u8>
for your container.
(You also need to implement Index, IndexMut, etc.). Just a random crazy thought :)
Well I nerdsniped myself; my last suggestion with custom container won't work I think, for number of reasons, but I think you're right that the hooks have to be in a pwrite-ish method itself; it may be possible to add another pwrite method or perhaps even a new trait that could do this, but i'm not sure what it would look like; my brain is fuzzy for now :)
Here's a toy implementation of something growable, let me know what you think:
use std::ops::{Index, IndexMut, RangeFrom};
use scroll::ctx::MeasureWith;
use scroll::Pwrite;
struct GrowableContainer {
buffer: Vec<u8>,
increment: usize
}
impl<T> MeasureWith<T> for GrowableContainer {
fn measure_with(&self, _: &T) -> usize {
usize::MAX
}
}
impl Index<usize> for GrowableContainer {
type Output = u8;
fn index(&self, index: usize) -> &Self::Output {
self.buffer.index(index)
}
}
impl Index<RangeFrom<usize>> for GrowableContainer {
type Output = [u8];
fn index(&self, index: RangeFrom<usize>) -> &Self::Output {
self.buffer.index(index)
}
}
impl IndexMut<RangeFrom<usize>> for GrowableContainer {
fn index_mut(&mut self, index: RangeFrom<usize>) -> &mut Self::Output {
let mut diff = index.start as isize - self.buffer.len() as isize;
while diff >= 0 {
self.buffer.extend(std::iter::repeat(0u8).take(self.increment));
diff -= self.increment as isize;
}
self.buffer.index_mut(index)
}
}
fn main() {
let mut buf = GrowableContainer { buffer: vec![], increment: 16 };
let mut offset = 0;
buf.gwrite_with(7u32, &mut offset, scroll::LE).unwrap();
buf.gwrite_with(0x1234u32, &mut offset, scroll::LE).unwrap();
println!("{:x?}", &buf.buffer[..offset]);
}
I also wonder if you could remove the supertraits on the Pwrite
definition and move the default implementations into the impl<Ctx: Copy, E: From<error::Error>, R: ?Sized + Index<usize> + IndexMut<RangeFrom<usize>> + MeasureWith<Ctx>> Pwrite<Ctx, E> for R
block, which would allow for easier custom container implementations without being a full API break (I believe).
I guess I nerd-sniped myself too, because I just wrote a full implementation :laughing:
I pushed my first pass into a fork, which includes the relaxed Pwrite
definition. If you could take a quick look and let me know if this is something you'd like to see, I'd be happy to add docs and create a PR!
It looks like the CI is passing??? So on first blush this looks quite excellent but I don’t want to be too hasty.
it is a breaking change (the trait gets an associated type) but that’s fine if do decide to go ahead with it.
but the solution appears quite elegant (before I heap too much praise I’ll have to review and test more, their could be some hidden gotcha here). But otherwise this is quite cool!
My major worry is that trait dispatch could become ambiguous but it looks ok. I remember having issues with this in the long ago past but hard to remember
First of all, I just wanted to say that I really like
scroll
! The context system and type-based parsing and everything works perfectly for my use case.However, there is one thing that I can't seem to figure out how to do - writing types into a container like a
Vec
where the full size of the type is unknown.Consider this (admittedly contrived) example:
If I pre-allocated a
let mut buf = [0u8; 4]
to cover the u32 case, then I'd be wasting space on the u16 case.What I would love to do is something like this:
where the
Vec<u8>
only allocates as-needed.Is this something that's possible, potentially with a new container trait?