mikefarah / yq

yq is a portable command-line YAML, JSON, XML, CSV, TOML and properties processor
https://mikefarah.gitbook.io/yq/
MIT License
12.08k stars 591 forks source link

XML mangles order of heterogeneous elements #1983

Open mrnoname1000 opened 6 months ago

mrnoname1000 commented 6 months ago

Describe the bug yq tries to convert XML documents to mappings wherever possible, however this heuristic looks to be broken. Non-consecutive elements with the same tag name are grouped together, mangling the original document.

Version of yq: 4.42.1 Operating system: mac Installed via: homebrew

Input XML input.xml

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
    <key>AKLastIDMSEnvironment</key>
    <integer>0</integer>
    <key>AKLastLocale</key>
    <string>en_US</string>
    <key>AppleAntiAliasingThreshold</key>
    <integer>4</integer>
</dict>
</plist>

Command

yq -px -ox < input.xml > output.xml

Actual behavior

output.xml

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
  <dict>
    <key>AKLastIDMSEnvironment</key>
    <key>AKLastLocale</key>
    <key>AppleAntiAliasingThreshold</key>
    <integer>0</integer>
    <integer>4</integer>
    <string>en_US</string>
  </dict>
</plist>

Expected behavior

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
  <dict>
    <key>AKLastIDMSEnvironment</key>
    <integer>0</integer>
    <key>AKLastLocale</key>
    <string>en_US</string>
    <key>AppleAntiAliasingThreshold</key>
    <integer>4</integer>
  </dict>
</plist>

Additional context In my opinion, trying to represent XML as key/value pairs is an anti-pattern. Any XML element can contain any XML element or node in any order and any number of times. Since YAML can't handle duplicate keys, using arrays to represent all sequences would be more consistent and correct, if a little clunky:

- +p_xml: version="1.0" encoding="UTF-8"
- plist:
  - +@version: "1.0"
  - dict:
    - key: AKLastIDMSEnvironment
    - integer: "0"
    - key: AKLastLocale
    - string: en_US
    - key: AppleAntiAliasingThreshold
    - integer: "4"
mikefarah commented 6 months ago

Hmm yeah I think you're right, the only way to handle scenarios like you have would be to have everything as an array. This would be more correct - but less usable and I don't think most people are looking for that structure when they want to convert XML->Yaml.

It would also mean that you couldn't really convert ordinary Yaml/JSON to XML without complex work to re-arrange data into that sequence of key value pairs format.

A decoding flag could be added to parse/encode XML to/from that format - which would allow for existing behavior to continue (which I think, despite not being as accurate, is what most people would intuitively expect).

Be interested in knowing how often this case comes up

mrnoname1000 commented 6 months ago

I think the existing heuristic needs improvement, but an alternate dialect would be more convenient than a --xml- flag. Since lists are harder to use, -pX instead of -px would be a good shorthand, but just renaming -pxml to -pXML doesn't really signify the intent.

Conversion to/from this format could be handled by a pair of built-in functions. There's also the unfortunate case of nested (unnamed) arrays, which XML has no concept of.