tafia / quick-xml

Rust high performance xml reader and writer
MIT License
1.16k stars 231 forks source link

Option<bool> doesn't (de)serialize properly with serde #497

Open dralley opened 1 year ago

dralley commented 1 year ago

Using the following code:

use serde;
use quick_xml;

#[derive(serde::Serialize, serde::Deserialize)]
pub struct Player {
    spawn_forced: Option<bool>
}

fn main() {
    let data = Player {
       spawn_forced: None,
    };

    let mut deserialize_buffer = String::new();
    quick_xml::se::to_writer(&mut deserialize_buffer, &data).unwrap();
    println!("{}", &deserialize_buffer);

    quick_xml::de::from_reader::<&[u8], Player>(&deserialize_buffer.as_bytes())
                    .unwrap();
}

The code crashes during the deserialize step

[dalley@thinkpad playground]$ cargo run
   Compiling playground v0.1.0 (/home/dalley/Devel/practice/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/playground`
<Player><spawn_forced/></Player>
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidBoolean("")', src/main.rs:98:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

If instead spawn_forced is set to Some(true) or Some(false), it works fine. Only None, which serializes to <spawn_forced/>, panics upon deserialization.

[dalley@thinkpad playground]$ cargo run
   Compiling playground v0.1.0 (/home/dalley/Devel/practice/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/playground`
<Player><spawn_forced>true</spawn_forced></Player>
[dalley@thinkpad playground]$ cargo run
   Compiling playground v0.1.0 (/home/dalley/Devel/practice/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/playground`
<Player><spawn_forced>false</spawn_forced></Player>
dralley commented 1 year ago

Test case minimized from https://github.com/djkoloski/rust_serialization_benchmark/pull/34

dralley commented 1 year ago

This is using the current master branch ^^

Mingun commented 1 year ago

This bug is not specific to bool -- any Option<T> fields that deserializing from empty element, are affecting. Let's look at such XMLs:

<root/>                               <!-- (1) None -->
<root><element/></root>               <!-- (2) ? -->
<root><element>$DATA</element></root> <!-- (3) Some(...) -->

While result for (1) and (3) are pretty obvious, that is hard to say for (2). The JSON equivalent of that structs is

{}
{ "element": null }
{ "element": $DATA }

How we should map the case (2)? serde_json maps it to None always. That means, that, for example, Option<()> and any Option<UnitStruct> are always deserialized as None, while they are serialized into "value": null.

quick-xml currently does the opposite: it always deserializes to Some(...). It is a bit tricky to handle this similar to JSON, because our Deserializer::deserialize_option does not advance event pointer, so the type under Option has access to an Event::Start. We only lookahead to one event, but Deserializer::deserialize_option requires lookahead for 2 events: we need to look at Event::Text in order to see that it is empty and we need to return None. The Event::Text generated, because <element/> is represented as <element></element> for a deserializer and we want no difference in handling between that two representations.

Fixing that likely will have a consequence, that Option<&str> and Option<String> will never be deserialized to Some("").

dralley commented 1 year ago

How we should map the case (2)? serde_json maps it to None always. That means, that, for example, Option<()> and any Option are always deserialized as None, while they are serialized into "value": null.

Would it be possible to use a special attribute value, e.g. <element none="true" />?

Mingun commented 1 year ago

I would try to avoid that and use #[serde(deserialize_with = "path")] instead, if that would be possible, but handling xsi:nil="true" in that sense could be useful for interop (#464).

Mingun commented 1 year ago

Actually, that case is much more complex then I thought initially. It is unclear, how to process cases like

<element attribute="..."/>

At time of deciding if we should return None or not, we don't known how the type under Option would be deserialized. If it is a

struct Struct {
  #[serde(rename = "@attribute")]
  attribute: (),
}

then this XML snippet are definitely not None.

Because of the nature of XML we like to keep ability to deserialize things like

<element attribute="...">
  content
</element>

into a String "content", i.e. ignore excess attributes. This is unique situation for XML, because in JSON that XML would look like

{
  "@attribute": "...",
  "$text": "content"
}

and could not be deserialized into a String. Unfortunately, the situation when additional attributes added to the element are quite usual, so I think we should support that case. That means, however, that deserializing

<element attribute="..."/>

into an Option<String> we should return the same result as for

<element/>

which suggested to be None.

RoJac88 commented 1 year ago

Is there any update on this, or maybe some kind of workaraound?

Currently the only way I cound get the serialization of an optional value to work is when the target type is Option<String>, but even then the result is not correct, since empty strings serialize to Some("") and not None.

LB767 commented 1 year ago

The simplest workaround I found if you control both the serializing and de-serializing parts is to use the following serde attributes: #[serde(skip_serializing_if = "Option::is_none", default)]

This problem is still quite bothersome though...

jonaspleyer commented 1 month ago

The simplest workaround I found if you control both the serializing and de-serializing parts is to use the following serde attributes: #[serde(skip_serializing_if = "Option::is_none", default)]

This problem is still quite bothersome though...

To me this is not a good solution because I rely on multiple storage solutions (eg. Json, Ron, Sled) with the same (de)-serialization. I am mostly concerned with round-trips Rust -> xml -> Rust and obtaining identical results. Editing the serialization options could have non-desired effects in my other storage solutions. Thus I will avoid this approach.

Mingun commented 1 month ago

You have problems in round-trips in JSON too, and I think in any other format, they just different. If take JSON as a base point, at least #[serde(skip_serializing_if = "Option::is_none")] is necessary.

jonaspleyer commented 1 month ago

Yes I am aware that perfect round-trips are problematic in json as well. However, this MWE does work:

use serde::{Deserialize, Serialize};

fn main() {
    #[derive(Clone, Debug, Eq, PartialEq, Deserialize, Serialize)]
    struct Identifier {
        parent: Option<u64>,
    }
    let ident = Identifier { parent: None, };
    let json_string = serde_json::to_string(&ident).unwrap();
    let ident2: Identifier = serde_json::from_str(&json_string).unwrap();
    assert_eq!(ident, ident2);
}

which is my main concern at this point in time.