koute / speedy

A fast binary serialization framework
Apache License 2.0
381 stars 46 forks source link

Add #[speedy(non_exhaustive)] struct attribute #20

Open merlleu opened 2 years ago

merlleu commented 2 years ago

Hello,

I think we could add a non_exhaustive attribute for structs:

Example of code describing the current issue:

use speedy::{Writable, Readable};

#[derive(Writable, Readable, Debug)]
pub struct Struct1 {
    pub field1: bool,
    pub field2: Struct2,
    pub field3: Vec<Struct3>
}

#[derive(Writable, Readable, Debug)]
pub struct Struct2 {
    pub field2_1: u8
}

#[derive(Writable, Readable, Debug)]
pub struct Struct3 {
    pub field3_1: u64
}

fn dump() {
    let s = Struct1{field1: true, field2: Struct2 { field2_1: 8 }, field3: vec![Struct3 { field3_1: 99999 }] };
    s.write_to_file("struct1.bin").unwrap();
}

fn read() {
    println!("{:?}", Struct1::read_from_file("struct1.bin").unwrap());
}

Adding a field at the end of Struct2 or Struct3 will break compatibility with previously serialized data. (Adding a field to the end of Struct1 with the default_on_eof attribute will still work because he is at the end of the buffer).

Whereas with speed(non_exhaustive) attribute:

pub use speedy::{Writable, Readable};

#[derive(Writable, Readable)]
#[speedy(non_exhaustive)]
pub struct Struct1 {
    pub field1: bool,
    pub field2: Struct2,
    pub field3: Vec<Struct3>
}

#[derive(Writable, Readable)]
#[speedy(non_exhaustive)]
pub struct Struct2 {
    pub field2_1: u8
}

#[derive(Writable, Readable)]
#[speedy(non_exhaustive)]
pub struct Struct3 {
    pub field3_1: u64
}

fn dump() {
    let s = Struct1{field1: true, field2: Struct2 { field2_1: 8 }, field3: vec![Struct3 { field3_1: 99999 }]};
    s.write_to_file("struct1.bin").unwrap();
}

fn read() {
    Struct1::read_from_file("struct1.bin").unwrap();
}

In this scenario you can append a field to any struct with the speedy(non_exhaustive) attribute and it will still be able to read those structs.

Pros:

Cons:

I don't know if this is doable directly on the speedy crate, otherwise I think I'll just create a fork. Don't hesitate to give any feedbacks.

merlleu commented 2 years ago

I've done a basic implementation on https://github.com/merlleu/speedy, still need to write tests for this.

jeromegn commented 2 years ago

This would be pretty interesting for us for forward compatibility.

merlleu commented 2 years ago

Hello @koute , Is there any chance for this pull request to be merged (after conflicts are resolved), or do you think it should be a separate crate ?

Thanks.

koute commented 2 years ago

I do want to add something like this, but I'm not necessarily convinced that this is the right approach.

You can already achieve forward compatibility today without any extra changes necessary by just using an enum and then having each version of the struct be a new variant of that enum (so essentially the tag of the enum would be the "version" of the struct), and then manually implement the From trait to just convert it to the most recent version of the struct. So I have been thinking that maybe some sort of an approach based on that would be the most appropriate.

So instead of having the first implicit field be the number of fields it'd be the version number, and then the fields would be optionally tagged with an attribute specifying the range of the version numbers for which the field is present. This kind of approach would also allow you to remove fields from your structure definition once you migrate your data. (With just writing the number of fields you can't really do that.) I'm not yet sure what to do when writing the data; I guess it'd always write only the newest version of the struct, but then that'd mean it'd have to either implicitly ignore old fields, or generate a runtime error if they're not empty, and I don't really like either of those. I guess maybe some sort of a procedural attribute that'd automatically convert a versioned struct into an enum + automatically generate the From impl to convert to the newest version would be the best here?

merlleu commented 2 years ago

Thanks for the comment, I think you have a good point and versioning structs would be better. However, I'm not a big fan of the enum approach for this (requires a lot of user code for this).

What do you think of something like this:

#[derive(Writable, Readable)]
#[speedy(version = "4")]
pub struct Struct1 {
    #[speedy(before_version = "2")] // only present in versions 1
    pub field0: u8,
    // no version attribute, so always expected
    pub field1: bool,
    #[speedy(after_version = "2", before_version = "4")] // only present in versions 2 & 3
    pub field2: Struct2,
    #[speedy(after_version = "3")] // present in version 3,4,...
    pub field3: Vec<Struct3>
}

This approach lets us add & remove fields as we want without using enums at all. (requires a bit more code than using an enum & a From impl, but it should have a bit less overhead).

Then we could just implement almost the same logic used for the enum but under the hood. (Unless you have something against this).

hecatia-elegua commented 1 year ago

At that point you could also use semantic version numbers and use whatever lib Cargo uses for parsing and handling them, so:

#[speedy(version = "<2")]
pub field0: u8,
#[speedy(version = ">1, <4")]
pub field2: Struct2,

...although in this case there are only major version changes. Thinking further, patches could also be supported, similar to other ser/de frameworks.