tafia / quick-xml

Rust high performance xml reader and writer
MIT License
1.23k stars 238 forks source link

Question: How can I achieve this enum? #793

Closed xkikeg closed 3 months ago

xkikeg commented 3 months ago

I'm trying to parse ISO Camt053 XML, it used to have such a XML element

<RelatedParties>
  <Dbtr>
    <Nm>ピカチュウ</Nm>
  </Dbtr>
  <Cdtr>
    <Nm>サトシ</Nm>
  </Cdtr>
  <!-- others -->
</RelatedParties>

This chould be represented as

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct RelatedParties { 6 implementations
    #[serde(rename = "Dbtr")]
    pub debtor: Option<Party>,
    #[serde(rename = "Cdtr")]
    pub creditor: Option<Party>,
    // ...
}

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct Party { 6 implementations
    #[serde(rename = "Nm")]
    pub name: String,
}

Now there's a format change, and I need to support 2 formats. It could be just as-is, or

<RelatedParties>
  <Dbtr>
    <Pty>
      <Nm>ピカチュウ</Nm>
    </Pty>
  </Dbtr>
  <Cdtr>
    <Pty>
      <Nm>サトシ</Nm>
    </Pty>
  </Cdtr>
  <!-- others -->
</RelatedParties>

So it may have an extra <Pty> element. I tried following, but it doesn't work.

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct RelatedParties {
  // same
}

#[derive(Debug, Serialize, Deserialize, PartialEq, Eq)]
#[serde(untagged)]
pub enum Party {
    #[serde(rename = "Pty")]
    Nested(PartyDetails),
    #[serde(rename = "$value")]
    Inline(PartyDetails),
}

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct PartyDetails {
    #[serde(rename = "Nm")]
    pub name: String,
}

Unfortunately this doesn't work for either input. How can I fix the issue?

Mingun commented 3 months ago

There are several problems in your approach:

  1. Ability to rename variants of #[serde(untagged)] enums is a serde's fault, because they mean nothing. This code actually should not compile instead of putting user in confusion:
    #[derive(Debug, Serialize, Deserialize, PartialEq, Eq)]
    #[serde(untagged)]
    pub enum Party {
        #[serde(rename = "Pty")]
        Nested(PartyDetails),
        #[serde(rename = "$value")]
        Inline(PartyDetails),
    }

    I filled https://github.com/serde-rs/serde/issues/2787 about that.

  2. Because variant names does nothing in untagged enums, using the same type in different variants doesn't make any sense. Untagged enums is just a way to try one variant and if deserialization failed, try the next one. If the deserialization of PartyDetails failed in the first attempt, it also will fail in the second.

So in the ideal world the following types should work:

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct RelatedParties {
    #[serde(rename = "Dbtr")]
    pub debtor: Option<Party>,
    #[serde(rename = "Cdtr")]
    pub creditor: Option<Party>,
    // ...
}

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
#[serde(untagged)]
pub enum Party {
    V2(PartyV2),
    V1(PartyV1),
}

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct PartyV1 {
    #[serde(rename = "Nm")]
    pub name: String,
}

#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq)]
pub struct PartyV2 {
    #[serde(rename = "Pty")]
    pub party: PartyV1,
}

Unfortunately any bufferisation breaks XML deserializer in many cases because it have to use dirty tricks to deserialize things correctly (due to lack of suitable API in serde), and #[serde(untagged)] modifier places bufferisation step between the deserializer and the real types. To avoid that you could implement deserialization of Party manually or try to use serde-untagged. It was developed to avoid bufferisation.

xkikeg commented 3 months ago

Thanks! That helped me a lot