Closed DareDevilDenis closed 2 months ago
These numbers look awful 😞, but yeah xsdata like most pure python binding libraries will always be slower.
In your case specifically it's way worse because of all the union fields. The parser will attempt to parse the xml node with every given dataclass. Then it will take the "successful" attempts and return the one with highest score. This is a very crude process and unfortunately it's very slow.
field_value: List[
Union[
TrtApiVersion,
TfpgaApiVersion,
Tuxmid,
TcellSetId,
TulCarrier,
TcellEntityId,
Tueid,
TglobalTti,
TglobalTtiToDecode,
ExternalTtti,
Tnumerology,
TrntiType,
Trnti,
TsamplingFreq,
TmeasurementState,
TharqProcess,
TsymbolsFreqHop1,
TsymbolsFreqHop2,
TmeanEvm,
TmeanEvmPerLayer,
TevmPerSymbol,
ExternalTevm,
TnumPuschOfdmDmrsSymbols,
TdmrsOfdmSymbolIndex,
ExternalTdmrsSto,
TnumLayers,
ExternalTdmrsStoPerLayer,
ExternalTdmrsPower,
ExternalTdmrsCorrelation,
TnumAntennas,
ExternalTpowerParameters,
ExternalTdcLeakageMeasurement,
TpowerSummary,
TcrcFeedback,
TappliedCfo,
ExternalTdeltaCfo,
TulTimingOffset,
TphaseMeas,
ExternalTphaseMeasurements,
]
] = field(
default_factory=list,
metadata={
"name": "Field",
"type": "Element",
"min_occurs": 33,
"max_occurs": 39,
},
)
Leave it open, I want to take a look with the given sample to see if there is anything we can do to improve the performance...
Thanks @tefra. Please let me know if I can help with further testing.
In your case specifically it's way worse because of all the union fields. The parser will attempt to parse the xml node with every given dataclass. Then it will take the "successful" attempts and return the one with highest score. This is a very crude process and unfortunately it's very slow.
Are you saying there is nothing like namespace + tag index?
Hi @DareDevilDenis, I added an optimization to select earlier the correct element based on fixed attributes, I am not gonna say the performance is now great, but according to my local tests this decreases the ~200 ratio down to ~34
Using:
I'd like to ask about the performance of xsdata XML parsing. In my benchmarking I found it to be approximately 200 times slower than parsing to the in-built xml.etree.ElementTree.Element. I was expecting xsdata to be a little slower but this difference seems to be extreme. I tried both XmlEventHandler and LxmlEventHandler and got similar results.
Is this difference expected? If it's expected then I apologise for raising this as an issue.
Test script:
My results:
I've attached this script, "input.xml" and "my_dataclass.py": xsdata_xml_parse_performance.zip