media-io / yaserde

Yet Another Serializer/Deserializer
MIT License
180 stars 62 forks source link

can't deserialize nested field with same name + other fields #76

Open scottlamb opened 4 years ago

scottlamb commented 4 years ago

This onvif-rs crate was unable to parse the message below, with an error "unknown event EndDocument". Note that crate is using yaserde 137af01.

#[test]
fn security_capabilities_extension() {
    let ser = r#"
        <?xml version="1.0" encoding="UTF-8"?>
        <tt:Extension xmlns:tt="http://www.onvif.org/ver10/schema">
          <tt:TLS1.0>false</tt:TLS1.0>
          <tt:Extension>
            <tt:Dot1X>false</tt:Dot1X>
            <tt:SupportedEAPMethod>0</tt:SupportedEAPMethod>
            <tt:RemoteUserHandling>false</tt:RemoteUserHandling>
          </tt:Extension>
        </tt:Extension>
    "#;
    let des: tt::SecurityCapabilitiesExtension = yaserde::de::from_str(&ser).unwrap();
}

I got it down to a failing yaserde test. It's quite similar to the existing de_same_field_name_sub_sub test. Looks like an onvif-rs owner/contributor @DmitrySamoylov added this as part of #51 talking about a related problem. But it seems to still be broken when there are these other fields present.

#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
  #[derive(Default, PartialEq, Debug, YaDeserialize)]
  pub struct FooOuter {
      pub other: bool,
      pub foo: Option<FooInner>,
  }

  #[derive(Default, PartialEq, Debug, YaDeserialize)]
  pub struct FooInner {
      pub blah: bool,
  }

  convert_and_validate!(
    r#"
    <foo>
      <other>false</other>
      <foo>
        <blah>false</blah>
      </foo>
    </foo>
    "#,
    FooOuter,
    FooOuter {
      other: false,
      foo: Some(FooInner {
        blah: false,
      }),
    }
  );
}

With yaserde at that revision (137af01), I see the same error as in my "real" message.

With yaserde at current master (2726de5), it still doesn't work, but the symptom is different: it goes into a busy-loop during this test.

I haven't diagnosed further than this.

MarcAntoine-Arnaud commented 4 years ago

Hi @scottlamb,

Thank you for the issue. I have tested a little bit and that's true it not works.

That works:

#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
  #[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
  #[yaserde(rename="foo")]
  pub struct FooOuter {
      pub other: bool,
      #[yaserde(rename="faa")]
      pub foo: Option<FooInner>,
  }

  #[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
  pub struct FooInner {
      pub blah: bool,
  }

  let content = r#"
    <foo>
      <other>false</other>
      <faa>
        <blah>false</blah>
      </faa>
    </foo>
  "#;

  let model = FooOuter {
    other: false,
    foo: Some(FooInner {
      blah: false,
    }),
  };

  serialize_and_validate!(model, content);
  deserialize_and_validate!(content, model, FooOuter);
}

Changing faa with foo will not be able to deserialise.

scottlamb commented 4 years ago

Thanks for confirming!

In yaserde_derive/src/de/expand_struct.rs, it seems suspicious to me that it's matching on named_element. Couldn't it recognize its own element by position? Ie, its StartElement should come first, and the name is only for validation / error-checking the caller. The EndElement should come at the same depth as the StartElement, and again the name is only for validation. I don't think its element name in any other position should be treated specially.

scottlamb commented 4 years ago

I've been looking into this issue a bit.

First, I wrote something incorrect here:

In yaserde_derive/src/de/expand_struct.rs, it seems suspicious to me that it's matching on named_element.

I was referring to the line below. It's not using named_element; it's shadowing it. That's confusing but not actually problematic.

https://github.com/media-io/yaserde/blob/9a2aec0abee8c6969bedd54ac98e5f6a7ca55414/yaserde_derive/src/de/expand_struct.rs#L345

But this one seems wrong:

https://github.com/media-io/yaserde/blob/9a2aec0abee8c6969bedd54ac98e5f6a7ca55414/yaserde_derive/src/de/expand_struct.rs#L366

What I'd like to do is to (as I mentioned before) not treat the same element name specially. Either it's at the opening depth (and we can call Deserializer::expect_end_element) or it's not (and we shouldn't have any special treatment).

More generally, I'd like to set a more precise contract around YaDeserialize::deserialize about the state of the Deserializer before and afterward. I'd like to say something like this:

but this is a change in a few ways that I think requires a major version bump. Your thoughts? In particular:

In any case, I'll send a PR to improve debuggability a bit. Initialize logging in each of the tests as described here (with a new env_logger dev dependency). Add/tweak some log calls to make it more obvious what's going on. These helped me quite a bit in understanding how the code works.

scottlamb commented 4 years ago

I don't see any reasonable way to keep flattened structs working when called like root_flatten_struct and like flatten_attribute today. It either needs to support only one mode ... or it needs to know what mode it's called in

My preference would be to not support the root_flatten_struct style, but if it is necessary: it just occurred to me that flattened structs could go into "root mode" iff the first element they see is the a StartDocument rather than a StartElement. It's confusing to me anyway that Deserializer discards the StartDocument but includes the EndDocument. This too would be a contract change though. Would StartDocument be included just when sent to a flattened struct (so the deserializer needs to be set up in a different way for that) or also for structs/enums (and so they need logic to swallow it)?

Anyway, I'll send you some hopefully-uncontroversial cleanup/debugging stuff and await your opinion on contract changes.

I'll likely have more contract questions for you later, but this seems like a good first round to address this immediate problem.

DmitrySamoylov commented 4 years ago

@MarcAntoine-Arnaud is there any reason why using pull parser? Is it for performance reasons? I'm just thinking that code could be much cleaner if using xml-tree :thinking: . There are places where you need tree structure but there is not one and you have to preserve some state while iterating over XmlEvents.

MarcAntoine-Arnaud commented 4 years ago

Hello @scottlamb and @DmitrySamoylov !

Very interesting discussion in this thread. I'm quite open to big changes in this library, so feel free to suggest the ideal mode ! It was my first library using proc_macro 2 years ago, and I have already think about restart from scratch with a more stable design.

I have used xml-rs because basically the serde-xml was based on it, and helped me to start the project.

Do you have time next week for a meeting to discuss on all needed points ?

Best Marc-Antoine

scottlamb commented 4 years ago

Hmm, meetings are a little hard for me. I'm working on this for my Moonfire NVR project. It's my personal project, and my employer's copyright release agreement specifies I can't use their resources / time for it. I generally interpret that as not during the standard workday unless I take a vacation day. I'm a little too behind at work for that right now, so a meeting would have to be on the weekend or early morning / late evening Pacific time.

scottlamb commented 4 years ago

Would you still like to meet (and I apologize for my inconvenient time requirements) or should we converse over email/issues instead?

scottlamb commented 4 years ago

I just started a much more ambitious proposal in #84; I'm eager to hear your thoughts on it.

MarcAntoine-Arnaud commented 3 years ago

Is it possible to check if the #112 fixes it ?

DerNoob24 commented 3 years ago

I use the version 0.7.1, which includes this commit, and the problem is still not fixed when using such a example

<?xml version="1.0" encoding="utf-8"?>
<railml xmlns="https://www.railml.org/schemas/2018"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.4">
    <ocp name="ocp_name">
        <xsi:name value="name_value" long="name_long" />
    </ocp>
</railml>