maik / xml-simple

Easy API for working with XML documents
MIT License
88 stars 27 forks source link

NormaliseSpace not always sufficient to prevent whitespace making equivalent docs appear different #34

Open electrofelix opened 1 year ago

electrofelix commented 1 year ago

When using this library within vagrant-libvirt in order to help compare XML domain definition sent to libvirt vs what was actually applied by libvirt I discovered that docs that contain different whitespace will sometimes appear not to be equivalent even when using the option NormaliseSpace set to 2, due to the resulting structure may contain a content element while the other doc that did not have the extraneous whitespace will not contain this field at all.

e.g.

<domain xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' type=''>
  <name/>
  <title/>
  <description/>
  <uuid/>
  <memory/>
  <vcpu>1</vcpu>
  <cpu check="none" mode="host-model">

  </cpu>
  <os>
    <type>hvm</type>
    <kernel/>
    <initrd/>
    <cmdline/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <devices>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input bus='ps2' type='mouse'/>
    <graphics autoport='yes' keymap='en-us' listen='127.0.0.1' port='-1' type='vnc'/>
    <video>
      <model heads='1' type='cirrus' vram='16384'/>
    </video>
  </devices>
</domain>

vs:

<domain xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' type=''>
  <name/>
  <title/>
  <description/>
  <uuid/>
  <memory/>
  <vcpu>1</vcpu>
  <cpu check="none" mode="host-model"/>
  <os>
    <type>hvm</type>
    <kernel/>
    <initrd/>
    <cmdline/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <devices>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input bus='ps2' type='mouse'/>
    <graphics autoport='yes' keymap='en-us' listen='127.0.0.1' port='-1' type='vnc'/>
    <video>
      <model heads='1' type='cirrus' vram='16384'/>
    </video>
  </devices>
</domain>
require 'xmlsimple'

doc1 = <<EOF
<domain xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' type=''>
  <name/>
  <title/>
  <description/>
  <uuid/>
  <memory/>
  <vcpu>1</vcpu>
  <cpu check="none" mode="host-model">

  </cpu>
  <os>
    <type>hvm</type>
    <kernel/>
    <initrd/>
    <cmdline/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <devices>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input bus='ps2' type='mouse'/>
    <graphics autoport='yes' keymap='en-us' listen='127.0.0.1' port='-1' type='vnc'/>
    <video>
      <model heads='1' type='cirrus' vram='16384'/>
    </video>
  </devices>
</domain>
EOF

doc2 = <<EOF
<domain xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' type=''>
  <name/>
  <title/>
  <description/>
  <uuid/>
  <memory/>
  <vcpu>1</vcpu>
  <cpu check="none" mode="host-model"/>
  <os>
    <type>hvm</type>
    <kernel/>
    <initrd/>
    <cmdline/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <devices>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input bus='ps2' type='mouse'/>
    <graphics autoport='yes' keymap='en-us' listen='127.0.0.1' port='-1' type='vnc'/>
    <video>
      <model heads='1' type='cirrus' vram='16384'/>
    </video>
  </devices>
</domain>
EOF

xml1 = XmlSimple.xml_in(doc1, {'NormalizeSpace' => 2})
xml2 = XmlSimple.xml_in(doc2, {'NormalizeSpace' => 2})
xml1 == xml2

will result in false.

I've added some code to vagrant-libvirt to walk the returned structure and remove the content if it's empty since if the whitespace is not relevant there should be no need for the attribute to be defined. https://github.com/vagrant-libvirt/vagrant-libvirt/blob/439830b6d42758e2607c84d518f61be4235a8213/lib/vagrant-libvirt/util/xml.rb#L25-L43

If I then output the xml1 and input it again, it will have dropped the content field, so I'm wondering if maybe it should be not set at all in the case NormaliseSpace is 2 and after stripping it is empty?

maik commented 1 year ago

Thank you very much for reporting this!

I could reproduce the issue and it the library's behavior is inconsistent in this case. Still, I would prefer not to fix it to prevent any incompatibilities with existing code bases.

I could imagine to add a new option to suppress the empty content attribute in such cases.

electrofelix commented 1 year ago

Does this library need to align with perl XML::Simple? I was wondering if NormaliseSpace could become a binary flag set of options so that if the option to drop empty content attributes was using 4 then it could be 2 | 4 to combine them, but that depends on whether it makes sense to do that for the option.

maik commented 1 year ago

No, it does not have to be compatible with Perl XML::Simple. Actually, it never was, because it uses different defaults for some options, for example.

I just do not want to break existing behavior for current Ruby users. That is, I do not want to change the current behavior when using NormaliseSpace and I do not want to change the API of NormaliseSpace. Of course, it would be possible to extend it, but I do not want to change how it works at the moment. So, the best way to fix this issue would probably be to add a new option.

On Wed, Aug 31, 2022 at 7:36 PM Darragh Bailey @.***> wrote:

Does this library need to align with perl XML::Simple? I was wondering if NormaliseSpace could become a binary flag set of options so that if the option to drop empty content attributes was using 4 then it could be 2 | 4 to combine them, but that depends on whether it makes sense to do that for the option.

— Reply to this email directly, view it on GitHub https://github.com/maik/xml-simple/issues/34#issuecomment-1233227661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFQTYLHV7OQUOIHPEWBOLV36J2ZANCNFSM6AAAAAAQASOKRE . You are receiving this because you commented.Message ID: @.***>