Question about preferred way of XML parsing

afuetterer commented 3 months ago

Hi, I have a quick question about XML parsing. I want to define a single parser that is reused after parsing each XML string.

The docs say:

>>> from xsdata.formats.dataclass.context import XmlContext
>>> from xsdata.formats.dataclass.parsers import XmlParser
>>> from xsdata.formats.dataclass.parsers.config import ParserConfig

>>> config = ParserConfig()
>>> context = XmlContext()
>>> parser = XmlParser(context=context, config=config)
>>> parser = XmlParser()

See: https://xsdata.readthedocs.io/en/latest/data_binding/xml_parsing/

There is no comment or explanation about these steps. Also the parser is defined twice in this snippet.

Do I need to initialize config and context objects for optimal performance?

Looking forward for any tips and hints.

tefra commented 3 months ago

Hi @afuetterer

Take a look here https://xsdata.readthedocs.io/en/latest/data_binding/basics/

Context All binding metadata is generated and cached in a XmlContext instance. It's recommended to either reuse the same parser/serializer instance or reuse the context instance.

afuetterer commented 3 months ago

Thank you.

Are there even more benefits in reusing context AND parser or does that make no difference?

tefra commented 3 months ago

No the only real benefit comes from reusing the context and comes down to the number of your models,

For example the netex suite, it's about ~6k models, on my dev machine it takes 0.65seconds to build the binding metadata for all the models the cache size is almost ~29mb.

afuetterer commented 3 months ago

Alright, thank you.

tefra / xsdata

Question about preferred way of XML parsing #1017