leomarquine / php-etl

Extract, Transform and Load data using PHP.
MIT License
180 stars 82 forks source link

Replace XMLReader with SimpleXML #24

Closed sekjal closed 4 years ago

sekjal commented 5 years ago

SimpleXML is available by default in PHP since 5.1.3, and allows XPath queries. Switching the XML Extractor to using this module has a number of advantages:

Performance seems to be comparable in my limited testing.

If this is a valuable enhancement to the existing XML extractor, I can submit a pull request. I could package this as a new extractor type ("simplexml") if that would be better for backward compatibility.

leomarquine commented 4 years ago

The problem with SimpleXML is that we need to load all the xml in memory and we cant take advantage of generators like we do in the current implementation.

But, if necessary, we can have both extractors.

sekjal commented 4 years ago

I'm seeing that problem now... even with a modestly sized XML document (~500MB), PHP starts to choke. I really like the ability to do full xpath on the loop and the the columns, but there is definitely a cost. Going to mark this as closed, since replacing the current XML Extractor is not appropriate.