serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.16k stars 774 forks source link

Support big arrays #631

Closed dtolnay closed 6 years ago

dtolnay commented 7 years ago

Servo does this to support big arrays in one of their proc macros:

https://github.com/servo/heapsize/blob/44e86d6d48a09c9cbc30a122bc8725b188d017b2/derive/lib.rs#L36-L41

Let's do the same but only if the size of the array exceeds our biggest builtin impl.

Thanks @nox.

dtolnay commented 7 years ago

One relatively easy workaround for serialization is coercing to a slice:

struct S {
    #[serde(serialize_with = "<[_]>::serialize")]
    arr: [u8; 256],
}

Deserialization is still annoying I think.

sdleffler commented 7 years ago

Hey folks, this feature is important to me because I'd like to be able to serialize a 512-bit hash (so, 64 bytes) and because the serde impls necessarily only go up to [u8; 32] I cannot serialize a [u8; 64].

As workarounds I'm considering using [[u8; 32]; 2], GenericArray or just lazily using a Box<[u8]>. I'm piqued by the idea of the workaround shown above - @dtolnay did you ever find a deserialization workaround?

Would it be okay to add impls up to 64? Or is there some reason that hasn't been done?

clarfonthey commented 7 years ago

In the meantime, perhaps we should add impls for the sizes that arrayvec provides?

impl<T> Array for [T; 40]
impl<T> Array for [T; 48]
impl<T> Array for [T; 50]
impl<T> Array for [T; 56]
impl<T> Array for [T; 64]
impl<T> Array for [T; 72]
impl<T> Array for [T; 96]
impl<T> Array for [T; 100]
impl<T> Array for [T; 128]
impl<T> Array for [T; 160]
impl<T> Array for [T; 192]
impl<T> Array for [T; 200]
impl<T> Array for [T; 224]
impl<T> Array for [T; 256]
impl<T> Array for [T; 384]
impl<T> Array for [T; 512]
impl<T> Array for [T; 768]
impl<T> Array for [T; 1024]
impl<T> Array for [T; 2048]
impl<T> Array for [T; 4096]
impl<T> Array for [T; 8192]
impl<T> Array for [T; 16384]
impl<T> Array for [T; 32768]
impl<T> Array for [T; 65536]
dtolnay commented 7 years ago

@clarcharr I would prefer to stick with what the standard library does, which is 0 to 32 (inclusive).

dtolnay commented 7 years ago

Here is a workaround for deserializing.

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::fmt;
use std::marker::PhantomData;
use serde::ser::{Serialize, Serializer, SerializeTuple};
use serde::de::{Deserialize, Deserializer, Visitor, SeqAccess, Error};

trait BigArray<'de>: Sized {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer;
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'de>;
}

macro_rules! big_array {
    ($($len:expr,)+) => {
        $(
            impl<'de, T> BigArray<'de> for [T; $len]
                where T: Default + Copy + Serialize + Deserialize<'de>
            {
                fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
                    where S: Serializer
                {
                    let mut seq = serializer.serialize_tuple(self.len())?;
                    for elem in &self[..] {
                        seq.serialize_element(elem)?;
                    }
                    seq.end()
                }

                fn deserialize<D>(deserializer: D) -> Result<[T; $len], D::Error>
                    where D: Deserializer<'de>
                {
                    struct ArrayVisitor<T> {
                        element: PhantomData<T>,
                    }

                    impl<'de, T> Visitor<'de> for ArrayVisitor<T>
                        where T: Default + Copy + Deserialize<'de>
                    {
                        type Value = [T; $len];

                        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                            formatter.write_str(concat!("an array of length ", $len))
                        }

                        fn visit_seq<A>(self, mut seq: A) -> Result<[T; $len], A::Error>
                            where A: SeqAccess<'de>
                        {
                            let mut arr = [T::default(); $len];
                            for i in 0..$len {
                                arr[i] = seq.next_element()?
                                    .ok_or_else(|| Error::invalid_length(i, &self))?;
                            }
                            Ok(arr)
                        }
                    }

                    let visitor = ArrayVisitor { element: PhantomData };
                    deserializer.deserialize_tuple($len, visitor)
                }
            }
        )+
    }
}

big_array! {
    40, 48, 50, 56, 64, 72, 96, 100, 128, 160, 192, 200, 224, 256, 384, 512,
    768, 1024, 2048, 4096, 8192, 16384, 32768, 65536,
}

#[derive(Serialize, Deserialize)]
struct S {
    #[serde(with = "BigArray")]
    arr: [u8; 64],
}

fn main() {
    let s = S { arr: [1; 64] };
    let j = serde_json::to_string(&s).unwrap();
    println!("{}", j);
    serde_json::from_str::<S>(&j).unwrap();
}
Binero commented 7 years ago

As long as you're not working with primes:

#[derive(Serialize, Deserialize, Debug)]
struct MyStruct {
    data: [[u8; 32]; 16],
}

impl MyStruct {
    fn data(&self) -> &[u8; 512] {
        use std::mem::transmute;
        unsafe { transmute(&self.data) }
    }
}

This is a pretty neat workaround for when never expect a human to use the serialised version (e.g. bincode), as it creates a nested array. Added bonus: it also works for Debug, PartialEq, etc.

Boscop commented 6 years ago

FWIW, I use this:

use serde::{Serialize, Serializer};

pub fn serialize_array<S, T>(array: &[T], serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer, T: Serialize {
    array.serialize(serializer)
}

#[macro_export]
macro_rules! serde_array { ($m:ident, $n:expr) => {
    pub mod $m {
        use std::{ptr, mem};
        use serde::{Deserialize, Deserializer, de};
        pub use $crate::serialize_array as serialize;
        use super::*;

        pub fn deserialize<'de, D, T>(deserializer: D) -> Result<[T; $n], D::Error>
        where D: Deserializer<'de>, T: Deserialize<'de> + 'de {
            let slice: Vec<T> = Deserialize::deserialize(deserializer)?;
            if slice.len() != $n {
                return Err(de::Error::custom("input slice has wrong length"));
            }
            unsafe {
                let mut result: [T; $n] = mem::uninitialized();
                for (src, dst) in slice.into_iter().zip(&mut result[..]) {
                    ptr::write(dst, src);
                }
                Ok(result)
            }
        }
    }
}}

serde_array!(a64, 64);
serde_array!(a120, 120);
serde_array!(a128, 128);
serde_array!(a384, 384);

And then

struct Foo {
    #[serde(with = "a128")]
    bar: [f32; 128],
}
dtolnay commented 6 years ago

I do not plan to implement the workaround from heapsize_derive. I would prefer to see something like https://github.com/serde-rs/serde/issues/631#issuecomment-322677033 provided in a crate.

est31 commented 5 years ago

@dtolnay do I have your permission to publish this in a crate? You will be credited as co-author.

dtolnay commented 5 years ago

Yes go for it! Thanks.

est31 commented 5 years ago

Thanks! Published: https://github.com/est31/serde-big-array | https://crates.io/crates/serde-big-array

est31 commented 5 years ago

@dtolnay what do you think, does moving it into the serde-rs org make sense?

trrichard commented 5 years ago

+1 on moving this into serde-rs.

Ability to serialize/de-serialize arrays larger than 32 should be a core feature. I'd use it for sure.

@dtolnay I do think we should consider changing the derive macro to support it instead. I'd rather have it work out of the box if possible.

dtolnay commented 5 years ago

I posted a request for implementation of a slightly different approach: https://github.com/dtolnay/request-for-implementation/issues/17.

est31 commented 5 years ago

To all the people in this thread hoping that const generics will resolve this: When trying to port serde to using const generics, I came across the problem that Serialize and Deserialize are implemented for arrays of size 0 on all types, not requiring Serialize or Deserialize on the type itself. See commit 6388019ad4840a1b5c515ffc353e6a4f2df3adc3 that introduced it. This is a major hurdle as of course serde isn't in the business of doing breaking changes any more. So we'll have to wait for a language improvement to allow sth. like impl <T, const N: usize> Serialize for [T; N] where N > 0 or specialization until it can be fixed in serde proper.

This prolongs the lifetime of the serde-big-array crate until such a fix appears on the stable language, which can be well into next decade. Also I'm currently researching whether maybe at least serde-big-array can avoid requiring you to specify array sizes: https://github.com/est31/serde-big-array/issues/3.