serde-rs / bytes

Wrapper types to enable optimized handling of &[u8] and Vec<u8>
Apache License 2.0
317 stars 37 forks source link

Functionality of #[serde(with = "serde_bytes")] for deserialization of &[u8] #30

Closed Xiretza closed 2 years ago

Xiretza commented 2 years ago

Hi,

I'm having a hard time figuring out how exactly #[serde(with = "serde_bytes")] on a &[u8] provides any efficiency gains for deserialization. From what I can see, this causes serde_bytes::deserialize() to be called, which forwards to serde_bytes::Deserialize for &[u8], which just calls serde::Deserialize::deserialize(). How is this any different from not using serde_bytes at all? Am I missing something, or is this crate simply useless for optimizing deserialization of &[u8]?

erickt commented 2 years ago

Some serialization formats, like cbor, have a more efficient encoding for an array of u8s compared to an array of arbitrary types. Formats like bincode go even further and don’t even encode a tag describing what the next item is, so the deserializer doesn’t know the type of the next type. Other formats, like json, do not.

to handle this, serde Deserialize impls need to hint to the Deserializer what we expect the next type to be. The Deserializer could then use this to decide to parse the next chunk in a special way for an untagged format like bincode, or fall back to the default. serde_bytes is just a convenient way to provide this hint.

as to why serde doesn’t do this automatically, we can’t because rust doesn’t support specialization yet, so we need to default to the generic array deserialization, and provide serde_bytes to override the default.

Xiretza commented 2 years ago

I get that that's the idea, but I can't find the point in the serde_bytes code where this actually happens for deserializing a &[u8]. It seems to just forward straight to the default serde implementation.

Xiretza commented 2 years ago

I have a feeling this might just be a simple bug, I've opened #31.

erickt commented 2 years ago

Actually, my previous comment was wrong, and I don't think we need #31. I think what might be happening here is some confusion over reusing the same name (especially since it confused me). First, there actually is an impl in serde explicitly for &'a [u8]. I thought we implemented that for a generic &[T] but I was wrong. Second, the impl in serde_bytes::de for &'a [u8] is implementing serde_bytes::Deserialize, not serde::Deserialize. This is just wiring up the generic impls so it can be used with serde_bytes::deserialize, which takes a serde_bytes::Deserialize, not a serde::Deserialize.

dtolnay commented 2 years ago

#[serde(with = "serde_bytes")] on &[u8] isn't for deserialization, it is for serialization. On other types it optimizes both serialization and deserialization. The deserialize-related code in this crate related to &[u8] is just to make that attribute work without getting a compile error if you also derive Deserialize.

Xiretza commented 2 years ago

Ah, I see! I thought the visit_borrowed_{bytes,str}() methods on BytesVisitor were the optimization, but if deserialization of Bytes is actually the same as that of &[u8] in plain serde, I'll just document that.

Xiretza commented 2 years ago

Does that also mean that the entire BytesVisitor is redundant and that impl serde::Deserialize for Bytes could just forward to serde::Deserialize for &[u8]?