rust-ux / uX

Non standard integer types like `u7`, `u9`, `u10`, `u63`, `i7`, `i9`
Apache License 2.0
121 stars 37 forks source link

Serde support #17

Open kjetilkjeka opened 6 years ago

kjetilkjeka commented 6 years ago

This issue is for discussing things related to serde support.

Unresolved questions:

Serde data model mismatch

The serde data model states that it supports 14 primitive types. All of them is aligned on 8, 16, 64 or 128 bits. Is it possible to represent the types of this crates (conceptually aligned at other bit positions) with the serde memory model in a way that will be correct for all implemented protocols (including those that will be implemented in the future).

One example of problematic behavior is the u24 type, when serialized with JSON it should print a single number between 0 and 2^24. When serialized with a binary format, should it behave as a u32 (serialize into 4 bytes) or as an [u8; 3] (serialized into 3 bytes)?

Another problematic example is the following struct:

#[derive(Serialize, Deserialize)]
pub struct {
    foo: u7,
    bar: u9,
}

How should this struct be serialized for a binary format, as 2 bytes or 3 bytes?

It seems like it's the binary formats that are problematic and text based format (like json) is trivial. If there would be any way to opt out of binary format that would probably be a good idea for the time being.

Stability guarantees

This crate has stability as one of it's main goal, are there any ways we can also support a serde v2 release without breaking this crate?

kjetilkjeka commented 6 years ago

@GuillaumeGomez I've updated this comment in an attempt to more clearly express my concern. I think the second one is quite solvable but I'm not really sure how to tackle the first one. Any ideas?

GuillaumeGomez commented 6 years ago

It all depends on one thing: does it need to be represented as a u24 in memory? If yes, then go for [u8; 3] hidden inside a struct, otherwise go for a u32.

kjetilkjeka commented 6 years ago

It all depends on one thing: does it need to be represented as a u24 in memory? If yes, then go for [u8; 3] hidden inside a struct, otherwise go for a u32.

I guess that depends on the serde format, and as a data type we must be compatible to all formats.

However, we might be saved by the assumption that formats relying on things like 24bit types is not implementable with serde. And therefore be able to implement it as an u32 regardless. If we explain this in the serde feature documentation I think we are OK.

upsuper commented 2 years ago

I have a feeling that it might be fine to just support Serde by treating all the types as the next pow of 2 primitive type (i.e. u7 -> u8, u24 -> u32) in serialization, with such caveat being clearly documented.

Unless developers of Serde state otherwise, I tend to believe that it is unlikely for Serde to add support for arbitrary bit-length integer in a foreseeable future (there was also serde-rs/serde#1312), and thus there is probably not going to be a perfect solution for this crate for binary formats.

In addition, if someone wants to serialize struct Foo(u7, u9) as a single word in binary format, they should be able to do so reasonably easily through

#[derive(Clone, Serialize, Deserialize)]
#[serde(from = "u16")]
#[serde(into = "u16")]
struct Foo(u7, u9);

impl From<u16> for Foo {
    fn from(n: u16) -> Foo {
        Foo(
            u7::new((n >> 9) as u8),
            u9::new(n & 0b1_1111_1111),
        )
    }
}

impl From<Foo> for u16 {
    fn from(n: Foo) -> u16 {
        (u16::from(n.0) << 9) | u16::from(n.1)
    }
}

And it is possible to have a proc macro to generate this boilerplate. With such a proc macro, this can even be simplified to something like

#[derive(Clone, Serialize, Deserialize, PackAsU16)]
#[serde(from = "u16")]
#[serde(into = "u16")]
struct Foo(u7, u9);

On the other hand, if someone wants to serialize them to text format, but this crate doesn't come with Serde support builtin, they would need to do something like

#[derive(Serialize, Deserialize)]
#[serde(remote = "u2")]
struct U2Serde(#[serde(getter = "u2_to_u8")] u8);
impl Into<u2> for U2Serde {
    fn into(self) -> u2 { u2::new(self.0) }
}
fn u2_to_u8(u2: &u2) -> u8 { u8::from(*u2) }

and annotate every field where u2 is used with #[serde(with = "U2Serde")] if they still want to use the types from this crate (rather than just create their own types).

This is very cumbersome because there would need to be boilerplate for not only each type used, but also each field using the types.

Thus, I would suggest that it is a better tradeoff to add Serde support with the caveat documented for binary formats.

kjetilkjeka commented 2 years ago

I think I see this a bit more clearly now than what I did years ago when I opened this issue. I agree with you here @upsuper

Is #49 in any way problematic for serde? I assume not as it includes conversions to/from the types in the serde data model?

upsuper commented 2 years ago

I don't think changing the internal representation would affect serialization as long as there is still way to convert between primitives and the types.