tefra / xsdata

Naive XML & JSON Bindings for python
https://xsdata.readthedocs.io
MIT License
310 stars 56 forks source link

Lxmltreeserializer str versus list #1020

Closed skinkie closed 3 months ago

skinkie commented 3 months ago

I am trying out #975.

import glob

from xsdata.formats.dataclass.context import XmlContext
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.config import ParserConfig
from xsdata.formats.dataclass.parsers.handlers import LxmlEventHandler, lxml
from xsdata.formats.dataclass.serializers import LxmlTreeSerializer

from netex import ServiceFrame

def conversion(input_filename: str, output_filename: str):
    context = XmlContext()
    config = ParserConfig(fail_on_unknown_properties=False)
    parser = XmlParser(context=context, config=config, handler=LxmlEventHandler)
    tree = lxml.etree.parse(input_filename)

    service_frame: ServiceFrame
    service_frame = parser.parse(tree.find(".//{http://www.netex.org.uk/netex}ServiceFrame"), ServiceFrame)

    lxml_serializer = LxmlTreeSerializer()
    element = tree.find(".//{http://www.netex.org.uk/netex}ServiceFrame")
    element.getparent().replace(element, lxml_serializer.render(service_frame))

    tree.write(output_filename, pretty_print=True, strip_text=True)

if __name__ == '__main__':
    for input_filename in glob.glob("/tmp/NeTEx_WSF_WSF_20240415_20240415.xml.gz"):
        print(input_filename)
        output_filename = input_filename.replace('/tmp/', 'netex-output-epip/')
        conversion(input_filename, output_filename)

The writing fails on the lxml_serializer.render, which complains with:

Traceback (most recent call last):
  File "/home/skinkie/Sources/reference/gtfs-netex-test/test-lxml.py", line 36, in <module>
    conversion(input_filename, output_filename)
  File "/home/skinkie/Sources/reference/gtfs-netex-test/test-lxml.py", line 28, in conversion
    element.getparent().replace(element, lxml_serializer.render(service_frame))
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/skinkie/Sources/reference/venv/lib/python3.11/site-packages/xsdata/formats/dataclass/serializers/tree/lxml.py", line 21, in render
    self.build(obj, builder)
  File "/home/skinkie/Sources/reference/venv/lib/python3.11/site-packages/xsdata/formats/dataclass/serializers/tree/mixins.py", line 55, in build
    builder.end(*element)
  File "src/lxml/saxparser.pxi", line 848, in lxml.etree.TreeBuilder.end
  File "src/lxml/saxparser.pxi", line 780, in lxml.etree.TreeBuilder._handleSaxEnd
  File "src/lxml/saxparser.pxi", line 749, in lxml.etree.TreeBuilder._flush
TypeError: sequence item 0: expected str instance, list found

If it can parse the code it, must be able to write it right?

With XmlTreeSerializer serialisation works.

File can be downloaded from: https://data.ndovloket.nl/netex/wsf/NeTEx_WSF_WSF_20240415_20240415.xml.gz

I have also found other bugs with rendering via lxmltreeserializer, where some elements just don't get rendered, their element name is added, but for example not their attributes.

tefra commented 3 months ago

Thanks for reporting @skinkie

The tree serializer, wasn't encoding values before feeding them to the tree builders, stuff like xs:NMTOKENS was failing.

skinkie commented 3 months ago

Thanks for reporting @skinkie

The tree serializer, wasn't encoding values before feeding them to the tree builders, stuff like xs:NMTOKENS was failing.

I'll test if the rest works now too.

skinkie commented 3 months ago

@tefra I think it is worse now, now all attributes are gone.

skinkie commented 3 months ago
            <ServiceJourneyPattern>
              <RouteRef></RouteRef>
              <DirectionRef></DirectionRef>
              <DestinationDisplayRef></DestinationDisplayRef>
              <pointsInSequence>
                <StopPointInJourneyPattern>
                  <ScheduledStopPointRef></ScheduledStopPointRef>
                  <OnwardTimingLinkRef></OnwardTimingLinkRef>
                  <IsWaitPoint>true</IsWaitPoint>
                </StopPointInJourneyPattern>
                <StopPointInJourneyPattern>
                  <ScheduledStopPointRef></ScheduledStopPointRef>
                </StopPointInJourneyPattern>
              </pointsInSequence>
            </ServiceJourneyPattern>

What I expect:

            <ServiceJourneyPattern id="WSF:ServiceJourneyPattern:B-V" version="1">
              <RouteRef version="1" ref="WSF:Route:B-V"/>
              <DirectionRef version="1" ref="OPENOV:Direction:outbound"/>
              <DestinationDisplayRef version="1" ref="WSF:DestinationDisplay:V"/>
              <pointsInSequence>
                <StopPointInJourneyPattern id="WSF:StopPointInJourneyPattern:B-V-B" version="1" order="1">
                  <ScheduledStopPointRef version="1" ref="WSF:ScheduledStopPoint:B"/>
                  <OnwardTimingLinkRef version="1" ref="WSF:TimingLink:B-V"/>
                  <IsWaitPoint>true</IsWaitPoint>
                </StopPointInJourneyPattern>
                <StopPointInJourneyPattern id="WSF:StopPointInJourneyPattern:B-V-V" version="1" order="2">
                  <ScheduledStopPointRef version="1" ref="WSF:ScheduledStopPoint:V"/>
                </StopPointInJourneyPattern>
              </pointsInSequence>
            </ServiceJourneyPattern>
tefra commented 3 months ago

Yeap I forgot to encode the attrs as well @skinkie, next one, please open new issues please, with simple examples that I can quickly reproduce