scottlamb / static-xml

serde-like serialization and deserialization of static Rust types in XML
Apache License 2.0
13 stars 2 forks source link

reduce `Visitor::finalize` overhead #1

Open scottlamb opened 2 years ago

scottlamb commented 2 years ago

Much of the bloat of the generated code is in the *Visitor::finalize methods, which transfer stuff from the builder (typically with types like Option<T>) to the final product (with types like T).

[slamb@slamb-workstation ~/git/crates/onvif-rs]$ cargo bloat --release --example camera --filter schema
...
File .text     Size  Crate Name
0.0%  0.1%   3.8KiB schema schema::onvif::_::PtzconfigurationVisitor::finalize
0.0%  0.1%   3.5KiB schema schema::onvif::_::ProfileVisitor::finalize
0.0%  0.1%   2.3KiB schema schema::onvif::_::MetadataConfigurationVisitor::finalize
0.0%  0.1%   2.2KiB schema schema::onvif::_::VideoSourceConfigurationVisitor::finalize
0.0%  0.0%   1.8KiB schema schema::onvif::_::AudioEncoderConfigurationVisitor::finalize
0.0%  0.0%   1.7KiB schema <schema::onvif::_::CapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.7KiB schema schema::onvif::config_description::_::MessagesTypeVisitor::finalize
0.0%  0.0%   1.5KiB schema schema::onvif::_::VideoAnalyticsConfigurationVisitor::finalize
0.0%  0.0%   1.4KiB schema schema::devicemgmt::_::GetDeviceInformationResponseVisitor::finalize
0.0%  0.0%   1.4KiB schema schema::onvif::_::ConfigVisitor::finalize
0.0%  0.0%   1.3KiB schema schema::onvif::_::ConfigDescriptionVisitor::finalize
0.0%  0.0%   1.2KiB schema schema::onvif::_::AudioOutputConfigurationVisitor::finalize
0.0%  0.0%   1.2KiB schema schema::onvif::_::VideoEncoderConfigurationVisitor::finalize
0.0%  0.0%   1.1KiB schema <T as static_xml::ser::SerializeField>::write
0.0%  0.0%   1.1KiB schema <schema::onvif::_::VideoEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.1KiB schema schema::onvif::_::DeviceCapabilitiesVisitor::finalize
0.0%  0.0%   1.1KiB schema schema::onvif::_::<impl static_xml::ser::Serialize for schema::onvif::Transport>::write_children
0.0%  0.0%   1.0KiB schema <schema::onvif::_::ProfileVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.0KiB schema <schema::onvif::_::MetadataConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.0KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::ProfileExtension>::deserialize
2.2%  7.3% 285.0KiB        And 993 smaller methods. Use -n N to show more.
2.4%  8.2% 317.4KiB        filtered data size, the file size is 12.7MiB

Look at ways to reduce it.

My first attempt was to bypass this entirely, more or less as yaserde does. #static_xml(direct) achieves this. But there are three downsides:

Maybe there's some big thing I can avoid in the generated code now. Or maybe there's a middle ground in which everything still needs Default (the first cavet holds) but the Visitor type is something like this:

struct BlahVisitor<'a> {
    out: &'a mut Blah,
    field_a_present: bool,
    field_b_present: bool,
}

and just check those bools, or via a bitmask. Maybe this could even work with the table-driven approach.

scottlamb commented 2 years ago

Here's an idea.

Step 1: test out the field_a_present approach as in the comment above. See if it's significantly smaller. I'm hoping so—the bloaty part is rearranging/individually copying the fields.

Step 2: assuming so, use unsafe and MaybeUninit to get rid of the Default requirement, maybe via something like this:

impl Deserialize for Foo {impl Deserialize for Foo {
    fn deserialize(element: ElementReader<'_>) -> Result<Self, VisitorError> {
        let mut out = MaybeUnit::uninit();
        let mut builder = BlahVisitor { out: &mut out, uninitialized: 3 };
        element.read_to(&mut builder)?;
        ::static_xml::de::check_uninitialized(self.uninitialized, &ELEMENTS[..])?;
        Ok(unsafe { out.assume_init() })
    }
}
struct BlahVisitor<'a> {
    out: &'a mut MaybeUninit<Blah>,
    uninitialized: u64, // bitmask
}
impl<'a> ElementVisitor for BlahVisitor<'a> {
    fn element<'a>(&mut self, child: ElementReader<'a>) -> Result<Option<ElementReader<'a>>, VisitorError> {
        match ::static_xml::de::find(&child.expanded_name(), ELEMENTS) {
            Some(0usize) if (self.uninitialized & 1) != 0 => {
                unsafe {
                    std::ptr::write(
                        self.out.as_mut_ptr().offset(offset_of!(Foo, field_a)) as *mut TypeA,
                        DeserializeField::init(child)?,
                    )
                };
                self.uninitialized &= ~1;
                return Ok(None);
            }
            Some(0usize) if (self.uninitialized & 1) == 0 => {
                let field = unsafe { self.out.as_mut_ptr().offset(offset_of!(Foo, field_a)) as *mut TypeA as &mut TypeA };
                DeserializeField::append(field, child)?
            }
            // .. likewise for field_b
            _ => Ok(Some(child)),
        }
    }
}

The idea is that if all the individual fields are initialized, then the final struct should be also. I don't think padding needs to be initialized. Any optional members should be filled in with a default at the end rather than failing via check_uninitialized.

This might be just moving the bloat to the fn element, but I'm hoping #5 can address that.

scottlamb commented 2 years ago

Looking good. Implemented the struct BlahVisitor<'a> { out: &'a mut Blah, *_present: bool } way, more or less. (I cheated a little bit: structs don't implement DeserializeFlatten, and it has that enum MyChoice { Foo(Vec<Foo>) } bug I mentioned above. The goal was to get numbers, not generate something worth checking in.)

File .text     Size  Crate Name
0.0%  0.1%   2.0KiB schema <schema::soap_envelope::_::FaultVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.8KiB schema <schema::onvif::_::VideoAnalyticsConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.8KiB schema <schema::onvif::_::CapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.4KiB schema <schema::onvif::_::IocapabilitiesExtensionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::AudioEncoderConfiguration>::deserialize
0.0%  0.0%   1.3KiB schema <schema::onvif::_::VideoSourceConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema <schema::onvif::_::VideoSourceConfigurationExtension2Visitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema <schema::onvif::_::DateTimeVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.3KiB schema <schema::onvif::_::MulticastConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema <schema::onvif::_::RotateVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema <schema::onvif::_::LensDescriptionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema schema::soap_envelope::Fault::is_unauthorized
0.0%  0.0%   1.2KiB schema <schema::onvif::_::IpaddressVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.2KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::MetadataConfiguration>::deserialize
0.0%  0.0%   1.2KiB schema <schema::onvif::_::VideoEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.1KiB schema schema::ptz::_::<impl static_xml::de::Deserialize for schema::ptz::GetStatusResponse>::deserialize
0.0%  0.0%   1.1KiB schema schema::devicemgmt::_::<impl static_xml::de::Deserialize for schema::devicemgmt::GetDeviceInformationResponse>::deserialize
0.0%  0.0%   1.1KiB schema <schema::onvif::_::H264ConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.1KiB schema <T as static_xml::ser::SerializeField>::write
0.0%  0.0%   1.1KiB schema <schema::onvif::_::Mpeg4ConfigurationVisitor as static_xml::de::ElementVisitor>::element
2.2%  7.2% 279.1KiB        And 877 smaller methods. Use -n N to show more.
2.4%  7.9% 305.3KiB        filtered data size, the file size is 12.7MiB

The finalize methods are gone, even with -n 0. I think they're getting inlined into deserialize (which makes sense: they only have one caller now, because I didn't implement the flatten versions).

$ cargo bloat --release --example camera --filter schema -n 0 | fgrep deserialize
...
0.0%  0.0%   1.3KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::AudioEncoderConfiguration>::deserialize
0.0%  0.0%   1.2KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::MetadataConfiguration>::deserialize
0.0%  0.0%   1.1KiB schema schema::ptz::_::<impl static_xml::de::Deserialize for schema::ptz::GetStatusResponse>::deserialize
0.0%  0.0%   1.1KiB schema schema::devicemgmt::_::<impl static_xml::de::Deserialize for schema::devicemgmt::GetDeviceInformationResponse>::deserialize
0.0%  0.0%   1.1KiB schema schema::common::_::<impl static_xml::de::Deserialize for schema::common::Ptzstatus>::deserialize
0.0%  0.0%   1.1KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::AudioOutputConfiguration>::deserialize
0.0%  0.0%   1.1KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::SecurityCapabilities>::deserialize
0.0%  0.0%   1.0KiB schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::MulticastConfiguration>::deserialize
0.0%  0.0%    1006B schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::Ptzconfiguration>::deserialize
0.0%  0.0%    1003B schema schema::onvif::_::<impl static_xml::de::Deserialize for schema::onvif::VideoAnalyticsConfiguration>::deserialize
...

so, looks like:

Also, the interface is actually improving. I no longer need as many traits. Where I now have this (and likewise for attributes and flattens):

https://github.com/scottlamb/static-xml/blob/ca51ec148462d60acd3595c71545863374306537/static-xml/src/de/mod.rs#L661-L689

I can instead have something like this:

pub trait DeserializeElementField {
    /// Called on the first occurrence of this field's element within the parent.
    fn init(element: ElementReader<'_>) -> Result<Self, VisitorError>;

    /// Called on subsequent occurrences of this field's element within the parent.
    ///
    /// `self` was previously returned by `init` and has been through zero or more prior `update` calls.
    fn update(&mut self, element: ElementReader<'_>) -> Result<(), VisitorError>;

    /// Called iff this field's element was not found within the parent and no default is set.
    fn missing(expected: &ExpandedNameRef<'_>) -> Result<Self, VisitorError>;
}

Fewer traits, IMHO more clear.

I'm going to proceed with the MaybeUninit approach to not require Default. I'll be able to get rid of the direct switch; users won't have to make unpleasant trade-offs.

scottlamb commented 2 years ago

I've got a messy, proof-of-concept implementation of the MaybeUninit approach in 0d591c7. New bloat output:

File .text     Size  Crate Name
0.0%  0.1%   3.5KiB schema <schema::onvif::_::CapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   3.5KiB schema <schema::onvif::_::PtzconfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.9KiB schema <schema::onvif::_::ProfileVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.9KiB schema <schema::onvif::_::VideoEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.9KiB schema <schema::onvif::_::MetadataConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.6KiB schema <schema::onvif::_::AudioEncoderConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.1KiB schema <schema::onvif::_::CapabilitiesExtensionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::soap_envelope::_::FaultVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::onvif::_::SystemCapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::onvif::_::VideoSourceConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.1%   2.0KiB schema <schema::onvif::_::SecurityCapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.9KiB schema <schema::onvif::_::SupportedAnalyticsModulesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.9KiB schema <schema::onvif::_::VideoAnalyticsConfigurationVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.8KiB schema <schema::onvif::_::SystemDateTimeVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.7KiB schema <schema::onvif::config_description::_::MessagesTypeVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.7KiB schema <schema::onvif::_::DeviceCapabilitiesVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.6KiB schema <schema::onvif::_::IocapabilitiesExtensionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.6KiB schema <schema::onvif::_::IpaddressVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.6KiB schema <schema::onvif::_::ItemListDescriptionVisitor as static_xml::de::ElementVisitor>::element
0.0%  0.0%   1.5KiB schema <schema::onvif::_::MulticastConfigurationVisitor as static_xml::de::ElementVisitor>::element
1.9%  6.4% 246.1KiB        And 792 smaller methods. Use -n N to show more.
2.2%  7.5% 289.9KiB        filtered data size, the file size is 12.7MiB

Note the schema crate's overall .text size is a little smaller, the grand total file size is the same (other segments grew?), and a lot of the bloat has shifted from finalize to element. The expanded code for one field in element now is:

                Some(0usize) if self.choice_present => {
                    ::static_xml::de::DeserializeElementField::update(
                        unsafe { &mut *unsafe { &raw mut (*self.out).choice } },
                        child,
                    )?;
                    Ok(None)
                }
                Some(0usize) => unsafe {
                    ::std::ptr::write(
                        unsafe { &raw mut (*self.out).choice },
                        ::static_xml::de::DeserializeElementField::init(child)?,
                    );
                    self.choice_present = true;
                    Ok(None)
                },

To improve this I'm going to look into the table-based approach.