kislyuk / yq

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
https://kislyuk.github.io/yq/
Apache License 2.0
2.53k stars 81 forks source link

XML streaming mode #166

Closed li-yq closed 1 year ago

li-yq commented 1 year ago

Hi there,

I'm trying to extract entries from a huge XML document, which is too large to load into the memory. So is it possible to handle XML files in a streaming mode?

It seems xmltodict.parse offers the item_depth param. Maybe we can add a cmdline option, so only each item at the given depth is passed to jq?

Or maybe it's possible to 1) implement the xml to json convertion in a streaming mode and 2) use the jq --stream mode?

Another option is to add support for XML file with multiple root elements (like the stream of multiple JSON in jq) and remove the outermost <root> and </root> with external tools like sed or awk. Kind of similar to the first solution.

kislyuk commented 1 year ago

Thanks for the suggestion, item_depth is a really nice feature that I agree we should integrate with xq. I'm working on it.

kislyuk commented 1 year ago

OK, I released v3.2.2, where you can now stream XML document contents with xq . --xml-item-depth=1, xq . --xml-item-depth=2, etc.