Open scottlamb opened 4 years ago
Hi @scottlamb,
Thank you for the issue. I have tested a little bit and that's true it not works.
That works:
#[test]
fn de_same_field_name_but_some_other_fields_or_something() {
#[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
#[yaserde(rename="foo")]
pub struct FooOuter {
pub other: bool,
#[yaserde(rename="faa")]
pub foo: Option<FooInner>,
}
#[derive(Default, PartialEq, Debug, YaDeserialize, YaSerialize)]
pub struct FooInner {
pub blah: bool,
}
let content = r#"
<foo>
<other>false</other>
<faa>
<blah>false</blah>
</faa>
</foo>
"#;
let model = FooOuter {
other: false,
foo: Some(FooInner {
blah: false,
}),
};
serialize_and_validate!(model, content);
deserialize_and_validate!(content, model, FooOuter);
}
Changing faa
with foo
will not be able to deserialise.
Thanks for confirming!
In yaserde_derive/src/de/expand_struct.rs
, it seems suspicious to me that it's matching on named_element
. Couldn't it recognize its own element by position? Ie, its StartElement
should come first, and the name is only for validation / error-checking the caller. The EndElement
should come at the same depth as the StartElement
, and again the name is only for validation. I don't think its element name in any other position should be treated specially.
I've been looking into this issue a bit.
First, I wrote something incorrect here:
In
yaserde_derive/src/de/expand_struct.rs
, it seems suspicious to me that it's matching onnamed_element
.
I was referring to the line below. It's not using named_element
; it's shadowing it. That's confusing but not actually problematic.
But this one seems wrong:
What I'd like to do is to (as I mentioned before) not treat the same element name specially. Either it's at the opening depth (and we can call Deserializer::expect_end_element
) or it's not (and we shouldn't have any special treatment).
More generally, I'd like to set a more precise contract around YaDeserialize::deserialize
about the state of the Deserializer
before and afterward. I'd like to say something like this:
StartElement
refers to this data structure (so the callee can examine its attributes). The callee shouldn't make any assumptions about its OwnedName
. In practice, I think XML Schema decouples the notion of element name and type so this would be consistent with that.EndElement
, have been consumed.Deserializer
.but this is a change in a few ways that I think requires a major version bump. Your thoughts? In particular:
EndElement
s today. Changing the contract in this way lets me call Deserializer::expect_end_element
from within them, which I think will catch problems sooner and more easily. But unless the versions of yaserde
and yaserde_derived
are expected to exactly match, this is an incompatible change.de_custom
in yaserde/tests/deserializer.rs
, it looks like it's expected for folks to implement YaDeserialize
themselves, and the expectations are...very loose. Not only does it not consume the EndElement
, but it doesn't even consume the stuff within it. The calling struct/enum could always consume elements until the depth matches its entry point. But that's extra code bloat (and onvif-rs's schema stuff is huge now). And I think it can conceal bugs. So I'd like to tighten this instead.root_flatten_struct
and like flatten_attribute
today. It either needs to support only one mode (and if attributes are supported, it'd need to be the one where the parent's start and end are included) or it needs to know what mode it's called in (so a new trait method or something). Is the root one an important use case, or could we just drop it?In any case, I'll send a PR to improve debuggability a bit. Initialize logging in each of the tests as described here (with a new env_logger
dev dependency). Add/tweak some log calls to make it more obvious what's going on. These helped me quite a bit in understanding how the code works.
I don't see any reasonable way to keep flattened structs working when called like
root_flatten_struct
and likeflatten_attribute
today. It either needs to support only one mode ... or it needs to know what mode it's called in
My preference would be to not support the root_flatten_struct
style, but if it is necessary: it just occurred to me that flattened structs could go into "root mode" iff the first element they see is the a StartDocument
rather than a StartElement
. It's confusing to me anyway that Deserializer
discards the StartDocument
but includes the EndDocument
. This too would be a contract change though. Would StartDocument
be included just when sent to a flattened struct (so the deserializer needs to be set up in a different way for that) or also for structs/enums (and so they need logic to swallow it)?
Anyway, I'll send you some hopefully-uncontroversial cleanup/debugging stuff and await your opinion on contract changes.
I'll likely have more contract questions for you later, but this seems like a good first round to address this immediate problem.
@MarcAntoine-Arnaud is there any reason why using pull parser? Is it for performance reasons? I'm just thinking that code could be much cleaner if using xml-tree :thinking: . There are places where you need tree structure but there is not one and you have to preserve some state while iterating over XmlEvents
.
Hello @scottlamb and @DmitrySamoylov !
Very interesting discussion in this thread. I'm quite open to big changes in this library, so feel free to suggest the ideal mode ! It was my first library using proc_macro 2 years ago, and I have already think about restart from scratch with a more stable design.
I have used xml-rs because basically the serde-xml was based on it, and helped me to start the project.
Do you have time next week for a meeting to discuss on all needed points ?
Best Marc-Antoine
Hmm, meetings are a little hard for me. I'm working on this for my Moonfire NVR project. It's my personal project, and my employer's copyright release agreement specifies I can't use their resources / time for it. I generally interpret that as not during the standard workday unless I take a vacation day. I'm a little too behind at work for that right now, so a meeting would have to be on the weekend or early morning / late evening Pacific time.
Would you still like to meet (and I apologize for my inconvenient time requirements) or should we converse over email/issues instead?
I just started a much more ambitious proposal in #84; I'm eager to hear your thoughts on it.
Is it possible to check if the #112 fixes it ?
I use the version 0.7.1, which includes this commit, and the problem is still not fixed when using such a example
<?xml version="1.0" encoding="utf-8"?>
<railml xmlns="https://www.railml.org/schemas/2018"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.4">
<ocp name="ocp_name">
<xsi:name value="name_value" long="name_long" />
</ocp>
</railml>
This onvif-rs crate was unable to parse the message below, with an error "unknown event EndDocument". Note that crate is using yaserde 137af01.
I got it down to a failing yaserde test. It's quite similar to the existing
de_same_field_name_sub_sub
test. Looks like an onvif-rs owner/contributor @DmitrySamoylov added this as part of #51 talking about a related problem. But it seems to still be broken when there are these other fields present.With yaserde at that revision (137af01), I see the same error as in my "real" message.
With yaserde at current master (2726de5), it still doesn't work, but the symptom is different: it goes into a busy-loop during this test.
I haven't diagnosed further than this.