serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.08k stars 768 forks source link

Internal buffering disrupts format-specific deserialization features #1183

Open mitsuhiko opened 6 years ago

mitsuhiko commented 6 years ago

Was not sure if I should file this against serde_json or here. As far as I can tell the bug is more likely in serde itself. While implementing #1179 I noticed that non string keys are not supported for flattening. However the same behavior happens if buffering happens for internally tagged enums:

Example that shows the problem:

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::collections::HashMap;

#[derive(Deserialize, Debug)]
#[serde(tag = "type")]
enum E {
    M(HashMap<u32, ()>),
}

fn main() {
    let j = r#" {"type": "M", "1": null} "#;
    println!("{}", serde_json::from_str::<E>(j).unwrap_err());
}

This currently fails with this error:

invalid type: string "1", expected u32

However a HashMap<u32, ()> is otherwise entirely permissible by serde_json.

dtolnay commented 6 years ago

There are some other JSON-specific assumptions baked into Content too. For example the representation of Serde enums happens to be what serde_json uses but not necessarily shared by other formats.

I suspect we will need a format-specific optional API on Deserializer for buffering data in a way that is consistent for that format. The default implementation would basically be equivalent to today's Content but formats could override it to use their own representation. Even serde_json may want to switch and use something like serde_json::Value for buffering rather than Content.

We should start by identifying all the ways Content is used in our generated code and generalize those into an API that can be provided by the deserializer.

dtolnay commented 6 years ago

This is complicated by associated type defaults being unstable https://github.com/rust-lang/rust/issues/29661. I thought we might want a format-specific buffer type as a Deserializer associated type with the default being Content, but we will need to find a different way.

dtolnay commented 6 years ago

Here is a backdoor way to add an associated type.

struct Content;

trait Deserializer {
    // Suppose we want something like this but associated type defaults are unstable.
    type Buffer = Content;
}
trait Deserializer {
    // Instead write it as an associated function that forwards to a generic callback.
    fn deserialize_with_buffering<F>(callback: F) -> F::Value
    where
        F: DeserializeWithBuffer,
    {
        callback.run::<Content>()
    }
}

trait DeserializeWithBuffer {
    type Value;
    fn run<Buffer>(self) -> Self::Value;
}
kardeiz commented 6 years ago

Is there any update on this?

dtolnay commented 6 years ago

Customizing the buffer type by Deserializer could provide a powerful alternative to #1327. The key constraint being satisfied in that approach (as compared to just stashing state in some thread local) was to accurately expose state during deserialization of untagged, internally tagged, and flattened members.

If a Deserializer can replace the buffer type, a library could provide a stateful Deserializer built on TLS by wrapping any existing Deserializer and wrapping that Deserializer's buffer type to correctly track the state.

dtolnay commented 6 years ago

I implemented a proof of concept in #1354.

dtolnay commented 5 years ago

Let's give this another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults in #1354 could have been made to work but would have been too complicated to maintain.

davepacheco commented 4 years ago

Is it possible that this issue explains the behavior described in nox/serde_urlencoded#66? In that case, the author is trying to deserialize the key-value pair bar=123 into a variant B { bar: i32 } in an untagged enum. It seems like this should work. Deserialization fails because "data did not match any variant of untagged enum Q2". When using a similar enum Q1 where the variant is B { bar: String }, it works as expected, with bar = "123".

dtolnay commented 4 years ago

Yeah that looks like a consequence of this issue. #[serde(with = "serde_with::rust::display_fromstr")] on the field (https://docs.rs/serde_with/1.4.0/serde_with/rust/display_fromstr/index.html) would work around it in that case but obviously is not ideal.

RagnarGrootKoerkamp commented 1 year ago

I have the following seemingly simple usage of flatten with an enum in the Inner struct. Is this crash also related to this issue? I see most failures with flatten are reffered to here, but haven't seen this particular case yet.

use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
enum E {
    A,
    B,
}

#[derive(Serialize, Deserialize)]
struct Inner {
    e: E,
}

#[derive(Serialize, Deserialize)]
struct Outer {
    #[serde(flatten)]
    inner: Inner,
}

fn main() {
    let yaml = "e: !A";

    // Works fine:
    let _inner: Inner = serde_yaml::from_str(&yaml).unwrap();

    // Fails at runtime with:
    // e: untagged and internally tagged enums do not support enum input
    let _outer: Outer = serde_yaml::from_str(&yaml).unwrap();
}

The docs have a sentence saying

"It is supported only within structs that have named fields, and the field to which it is applied must be a struct or map type."

Maybe that can/should be updated to mention that enums anywhere in the flattened struct (also more deeply nested) are not allowed either (at least when deserializing YAML).

fogti commented 1 year ago

@RagnarGrootKoerkamp that sounds like it is unrelated to flattening, does it work when you just try to deserialize that string as Inner (because it that fails with the same error, then it is unrelated to flattening, otherwise it's a bug in the flatten impl afaik)

RagnarGrootKoerkamp commented 1 year ago

@zseri Yes, deserializing to Inner works fine. Updated the example above.

Should I create a separate issue then?

espindola commented 1 year ago

This might be related: https://github.com/dtolnay/serde-yaml/issues/388

Yaml can treat a value of 42 as a string. This works without flatten, but fails with it.

Stepping on the code in gdb, the interesting transition seems to be:

#0  serde::__private::de::content::{impl#15}::deserialize_string<serde_yaml::error::Error, serde::de::impls::StringVisitor> (self=..., visitor=...)
    at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/private/de.rs:1260                  
#1  0x000055555556516d in serde::de::impls::{impl#8}::deserialize<serde::__private::de::content::ContentDeserializer<serde_yaml::error::Error>> (deserializer=...)
    at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/de/impls.rs:582
#2  0x00005555555681f0 in serde::de::{impl#5}::deserialize<alloc::string::String, serde::__private::de::content::ContentDeserializer<serde_yaml::error::Error>> (
    deserializer=...) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/de/mod.rs:794
#3  0x0000555555571b68 in serde::__private::de::{impl#10}::next_value_seed<serde_yaml::error::Error, core::marker::PhantomData<alloc::string::String>> (
    self=0x7fffffffd340, seed=...) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/private/de.rs:2814
#4  0x00005555555716c6 in serde::de::MapAccess::next_value<serde::__private::de::FlatStructAccess<serde_yaml::error::Error>, alloc::string::String> (
    self=0x7fffffffd340) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/de/mod.rs:1865
#5  0x0000555555563867 in serde_yaml::_::{impl#0}::deserialize::{impl#2}::visit_map<serde::__private::de::FlatStructAccess<serde_yaml::error::Error>> (__map=...)
    at src/main.rs:3  
#6  0x0000555555571e31 in serde::__private::de::{impl#8}::deserialize_struct<serde_yaml::error::Error, serde_yaml::_::{impl#0}::deserialize::__Visitor> (self=..., 
    fields=&[&str](size=1) = {...}, visitor=...) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/private/de.rs:2674
#7  0x0000555555563300 in serde_yaml::_::{impl#0}::deserialize<serde::__private::de::FlatMapDeserializer<serde_yaml::error::Error>> (__deserializer=...)
    at src/main.rs:3
#8  0x0000555555564d80 in serde_yaml::_::{impl#0}::deserialize::{impl#2}::visit_map<&mut serde_yaml::de::MapAccess> (__map=0x7fffffffd6d0) at src/main.rs:8

We go from using deserialize_string from serde_yaml, to using one from serde, which doesn't have the required logic.

juntyr commented 1 year ago

Let's give this another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults in #1354 could have been made to work but would have been too complicated to maintain.

It seems like rust-lang/rust#29661 still has a way to go, so in case #![feature(return_position_impl_trait_in_trait)] is stabilized earlier, perhaps something like the following could also work:

pub trait Deserializer<'de>: Sized {
    /* ... */

    fn deserialize_buffered(self) ->
        Result<impl Clone + Deserializer<'de>, Self::Error>
    {
        crate::__private::de::Content::deserialize(self)
    }
}

Are there other options that could perhaps be used to provide this feature earlier, since several serde-supporting formats have to battle buffering-based issues until then?

juntyr commented 1 year ago

@dtolnay Since return impl trait in traits will now be stabilised soon, do you think that using it to solve this issue would be an avenue worth pursuing?

YuhanunCitgez commented 9 months ago

Any update on this matter? Anything we can do to assist?

docwilco commented 3 months ago

I'll repeat the last comment:

Any update on this matter? Anything we can do to assist?