Open skinkie opened 5 months ago
We need to fully support the Iterable
type annotation for infinite generators in the data models, and the serializers.
The pr is a good first attempt @skinkie but it needs some more work
Doing a 3.4GB file using generators, takes ~12GB of memory to write using LxmlEventwriter. XmlEventWriter does absolutely not take any memory while writing to disk, and it does it in a streaming fashion. I think this must be investigated, especially if LxmlEventWriter is the default. I rewrote my whole project to split up stuff because I was under the impression I couldn't get it stored in memory.
It's mentioned in a few places in the docs
https://xsdata.readthedocs.io/en/latest/data_binding/xml_serializing/#alternative-writers https://xsdata.readthedocs.io/en/latest/api/formats/dataclass/serializers/writers/lxml/#xsdata.formats.dataclass.serializers.writers.lxml.LxmlEventWriter https://xsdata.readthedocs.io/en/latest/api/formats/dataclass/serializers/writers/native/#xsdata.formats.dataclass.serializers.writers.native.XmlEventWriter
For normal use cases, the lxml writer is always faster, 3.4GB xml is not very common 😄
@tefra it is mentioned that there are alternatives, but not the characteristics of the two.
Ideally I would like to write out a tree where the data is added just in time. The proposal in #1030 has an increasing memory usage, which suggests that the tree is still being build completely in memory. I wanted to add some evidence. Please ignore the timing.
Using the generator method:
Materializing into a list first:
Ideally, I wish that the memory consumption wouldn't increase at all, and the data would just been written out as it would be provided. But I guess the graphs do give a clear view where we can make some improvements when writing out huge documents.