media-io / yaserde

Yet Another Serializer/Deserializer
MIT License
175 stars 58 forks source link

Example, Generic XML Vector Type #87

Open mlevkov opened 4 years ago

mlevkov commented 4 years ago

This issue, hopefully, acts as an example for someone who is writing custom serializer and deserializer for this crate.

XML standard allows for an attribute to hold the value of a specific type denotes as an array (i.e. vector). The values of the type are space-separated and can only be of one type. Not sure about the mixed scenarios, but my case was specific to one type signifier.

The YaSerde library has YaSerializer and YaDeserializer traits, which can be specified against your own type with specific code conditions for types that are not already defined within the library itself. In this case, the attempt to serialize/deserialize the attribute with the following values, as an example, attributeX="1 2 3 6 5 0", into or from Vec::, would return an error. This is due to no support within a library for Vector of uint32. The same goes for the list of values in attributeY="a b c d e f g".

The way to resolve this issue by constructing a separate struct depicting a specific value intention. For example, if your list contains only uint32 types, then you would create the following struct:

#[derive(Default, Clone, Eq, PartialEq, Debug)]
pub struct UintVector {
    items: Vec<u32>,
}

Then you'd create a custom de/serialization implementation that looks like the following:

impl YaDeserialize for UintVector {
    fn deserialize<R>(reader: &mut yaserde::de::Deserializer<R>) -> Result<Self, String>
    where
        R: Read,
    {
        loop {
            match reader.next_event()? {
                xml::reader::XmlEvent::StartElement { .. } => {}
                xml::reader::XmlEvent::Characters(ref text_content) => {
                    let items: Vec<u32> = text_content
                        .split(' ')
                        .map(|item| item.to_owned())
                        .map(|item| item.parse().unwrap())
                        .collect();
                    return Ok(UintVector { items });
                }
                _ => {
                    break;
                }
            }
        }
        Err("Unable to parse attribute".to_string())
    }
}

impl YaSerialize for UintVector {
    fn serialize<W>(&self, writer: &mut yaserde::ser::Serializer<W>) -> Result<(), String>
    where
        W: Write,
    {
        let content = self
            .items
            .iter()
            .map(|item| item.to_string())
            .collect::<Vec<String>>()
            .join(" ");
        let data_event = xml::writer::XmlEvent::characters(&content);
        writer.write(data_event).map_err(|e| e.to_string())?;
        Ok(())
    }

    fn serialize_attributes(
        &self,
        source_attributes: Vec<xml::attribute::OwnedAttribute>,
        source_namespace: xml::namespace::Namespace,
    ) -> Result<
        (
            Vec<xml::attribute::OwnedAttribute>,
            xml::namespace::Namespace,
        ),
        String,
    > {
        Ok((source_attributes, source_namespace))
    }
}

However, if you happen to have a list of values that are alphanumeric and can be defined as Vec::, such as attributeY="a b c d e f g". Then you'd have to create the following struct and respective implementation:

#[derive(Default, Clone, PartialEq, Debug)]
pub struct StringList {
    items: Vec<String>,
}

impl YaDeserialize for StringList {
    fn deserialize<R: Read>(reader: &mut yaserde::de::Deserializer<R>) -> Result<Self, String> {
        loop {
            match reader.next_event()? {
                xml::reader::XmlEvent::StartElement { .. } => {}
                xml::reader::XmlEvent::Characters(ref text_content) => {
                    let items: Vec<String> = text_content
                        .split(' ')
                        .map(|item| item.to_owned())
                        .map(|item| item.parse().unwrap())
                        .collect();
                    return Ok(StringList { items });
                }
                _ => {
                    break;
                }
            }
        }
        Err("Unable to parse attribute".to_string())
    }
}

impl YaSerialize for StringList {
    fn serialize<W: Write>(&self, writer: &mut yaserde::ser::Serializer<W>) -> Result<(), String> {
        let content = self
            .items
            .iter()
            .map(|item| item.to_string())
            .collect::<Vec<String>>()
            .join(" ");
        let data_event = xml::writer::XmlEvent::characters(&content);
        writer.write(data_event).map_err(|e| e.to_string())?;
        Ok(())
    }

    fn serialize_attributes(
        &self,
        source_attributes: Vec<xml::attribute::OwnedAttribute>,
        source_namespace: xml::namespace::Namespace,
    ) -> Result<
        (
            Vec<xml::attribute::OwnedAttribute>,
            xml::namespace::Namespace,
        ),
        String,
    > {
        Ok((source_attributes, source_namespace))
    }
}

At this point, when you have two implementations, the rustfmt and clippy will complain of the code duplication. At which point any additional type de/serlization would further increase the code duplication and you start to wonder if such is a good approach. Since all I'm really doing here is implementing the same (similar) code for a specific type, can't I just try to make the type assertion generic while keeping the logic in place? Generic code comes to the rescue. The two types that I'm referencing above can be implemented with the following, where I define a struct XMLVector of type <T>:

#[derive(Default, Clone, Eq, PartialEq, Debug)]
pub struct XMLVector<T> {
    items: Vec<T>,
}

impl<T> YaDeserialize for XMLVector<T>
where
    T: FromStr + Debug,
    <T as FromStr>::Err: Debug,
{
    fn deserialize<R>(reader: &mut yaserde::de::Deserializer<R>) -> Result<Self, String>
    where
        R: Read,
    {
        loop {
            match reader.next_event()? {
                xml::reader::XmlEvent::StartElement { .. } => {}
                xml::reader::XmlEvent::Characters(ref text_content) => {
                    let items: Vec<T> = text_content
                        .split(' ')
                        .map(|item| item.to_owned())
                        .map(|item| item.parse().unwrap())
                        .collect();
                    return Ok(XMLVector { items });
                }
                _ => {
                    break;
                }
            }
        }
        Err("Unable to parse attribute".to_string())
    }
}

impl<T: ToString> YaSerialize for XMLVector<T> {
    fn serialize<W>(&self, writer: &mut yaserde::ser::Serializer<W>) -> Result<(), String>
    where
        W: Write,
    {
        let content = self
            .items
            .iter()
            .map(|item| item.to_string())
            .collect::<Vec<String>>()
            .join(" ");
        let data_event = xml::writer::XmlEvent::characters(&content);
        writer.write(data_event).map_err(|e| e.to_string())?;
        Ok(())
    }

    fn serialize_attributes(
        &self,
        source_attributes: Vec<xml::attribute::OwnedAttribute>,
        source_namespace: xml::namespace::Namespace,
    ) -> Result<
        (
            Vec<xml::attribute::OwnedAttribute>,
            xml::namespace::Namespace,
        ),
        String,
    > {
        Ok((source_attributes, source_namespace))
    }
}

I now can replace my custom implementations for each type by simply indicating the following UintVector becomes XMLVector::<unint32> and StringList becomes XMLVector::<String>. The code duplication issue goes away and this generic code can be applied to types supported by the library. However, one caveat is that Vec::<String> and Vec<String> is auto treated as the same according to rustfmt rules. YaSerde library supports Vec::<String> notation, not Vec<String> notation, at the time of this note. To avoid autoformat issues, you'd want to denote the struct field with #[rustfmt::skip] to allow your type to be treated by the library. The struct field might look like the following:

#[derive(Default, Debug, Clone, PartialEq, YaSerialize, YaDeserialize)]
pub struct SubRepresentationAttributes {
    #[rustfmt::skip]
    #[yaserde(attribute, rename = "contentComponent")]
    pub content_component: XMLVector::<String>, 
}

I hope this note helps other folks with creation of custom de/serializers. If I wrote something inaccurately or you have a better suggestion, all comments and suggestions are welcomed. I also want to credit @alechenthorne for helping with this effort. @MarcAntoine-Arnaud and team, welcome to comment.

brainstorm commented 3 years ago

This would be great to have included as proper documentation, otherwise it might get lost as an issue :_S Please consider writing similar to this instead of a GH issue? (WIP): https://github.com/media-io/yaserde/issues/87

mlevkov commented 3 years ago

@brainstorm This should probably live in the WiKi. I did not know where to place it as I did not want to lose this information for other people to consider. So, I placed it here for now. @MarcAntoine-Arnaud Any suggestions?

brainstorm commented 3 years ago

@mlevkov I've recently moved the examples repo, which should be archived/deleted, right @MarcAntoine-Arnaud? Into its own folder in the main yaserde repo (see PR #106).

I think it would be great if you moved this great writeup and code snippets into this new examples folder? Chances to have it lost are lower and the examples will be part of CI, so you code would definitely be used, seen and maintained better across versions, I reckon.

I hope I've convinced you to issue a pull request at this point? :)

mlevkov commented 1 year ago

@brainstorm I've not touched this crate for a while now. I will take a look at what changed as I'm starting back at using the crate. Hopefully, I will be able to make the commit that considers your suggestion and put the example to the location you've indicated. Thank you for the note.