jamesmunns / postcard

A no_std + serde compatible message library for Rust
Apache License 2.0
788 stars 76 forks source link

How should the size of `from_io`'s "sliding buffer" be chosen? #140

Open smoelius opened 3 months ago

smoelius commented 3 months ago

My question concerns the second component of from_io's val argument`: https://github.com/jamesmunns/postcard/blob/6338601498b23e962632087c44b19830d579227d/src/de/mod.rs#L85 From what I can tell, it is a "sliding buffer":

In the loopback test, the buffer is 2048 bytes: https://github.com/jamesmunns/postcard/blob/6338601498b23e962632087c44b19830d579227d/tests/loopback.rs#L191-L193

Is this a good choice, generally?

Thank you.

cc: @xgroleau

xgroleau commented 3 months ago

Hey @smoelius,

You just need to make sure that the buffer is big enough for all the data that is borrowed. (i.e., if you have field: &'a T such as field: &'a str in your struct). The borrowed data will be stored in that buffer, so it must fit in it. Thus it depends on the format of the data, if everything is owned or implements DeserializeOwned, the buffer can be of length 0 actually.

The test shown is actually pretty overkill to use a buffer of 2048, nothing is actually borrowed in BasicU8S. The test would pass if the buffer of lenght 0 such as let mut buff = [0; 0];.

For an example with borrowing, if instead we wanted to deserialize

#[derive(Debug, Serialize, Deserialize, Eq, PartialEq)]
struct BorrwedU8S<'a> {
    st: u16,
    ei: u8,
    sf: u64,
    tt: u32,
    borrowed: &'a u8,
}

We would need a buffer of at least a length of 1 to store the borrowed: &'a u8 field.

Hopefully that clarifies your question.

smoelius commented 3 months ago

Thank you, @xgroleau. Your response is very helpful. But should this example then work?

use postcard::{from_io, to_io};
use serde::de::DeserializeOwned;

fn main() {
    fn sanity(_: &impl DeserializeOwned) {}

    let data = String::from("hello");

    sanity(&data);

    let serialized: ::std::vec::Vec<u8> = vec![];
    let ser = to_io(&data, serialized).unwrap();
    {
        let mut buff = [0; 0]; // <-- Changed 2048 to 0.
        let x = ser.clone();
        let deserialized: String = from_io((x.as_slice(), &mut buff)).unwrap().0;
        assert_eq!(data, deserialized);
    }
}

I get:

thread 'main' panicked at src/main.rs:16:71:
called `Result::unwrap()` on an `Err` value: DeserializeUnexpectedEnd

I'm wondering if there may be a bug here: https://github.com/jamesmunns/postcard/blob/6338601498b23e962632087c44b19830d579227d/src/de/deserializer.rs#L387-L392

xgroleau commented 3 months ago

You are right, but it seems strings and probably deserialize_byte_buf and maybe others borrow the data first and then allocate. So in the end, that includes allocation actually, not just borrowed data. I misunderstood and didn't verify with the internals, sorry about that. Not sure if it is a bug more than just a misunderstanding of the actual behavior.

@jamesmunns could validate that if it is the actual desired behavior. If it is, then you need to make sure to have enough space for allocated items and we could probably document it to make sure the behavior is clear.

kyku commented 1 month ago

I stumbled upon this problem too and it seems that the given buffer must be big enough to contain the entire input. Because it is stack-allocated it means that you can't process anything bigger than about 8MiB (this is the stack size limit on x86_64 Linux).