metanorma / xml-c14n

Ruby library to canonicalize XML (e.g. for comparison)
0 stars 1 forks source link

Update Nokogiri transform XSL #4

Closed Novtopro closed 3 weeks ago

Novtopro commented 3 weeks ago

When I was working on https://github.com/metanorma/niso-jats/issues/18, I found that I needed to sort the nodes properly on the same level to make sure all the specs could pass.

In order to do that, I'll need to update the XML transform XSL.

ronaldtse commented 3 weeks ago

sort the nodes properly on the same level

@Novtopro XML child element order can be sensitive for the parent element. It depends on whether the parent element needs to retain the order.

Both are common patterns.

Order sensitive:

<my-element>
  <data>foo</data>
  <note>Note on "foo"</note>
  <data>bar</data>
  <note>Note on "bar"</note>
</my-element>

(potentially) Order insensitive:

<my-element>
  <data>
    <text>foo</text>
    <note>Note on "foo"</note>
  </data>
  <data>
    <text>bar</text>
    <note>Note on "bar"</note>
  </data>
</my-element>

And remember there is also the content node:

<note><p>This is <b>bold</b> text and even <b><i>bold and italic</i></b></p></note>

This demonstrates that element order cannot be ignored in checks by default.

opoudjis commented 3 weeks ago

I rely on order sensitivity for my use of this gem, and I've got 20 gems that use this. (I use XML for documents, not data.) If there is going to be order insensitivity, it needs to be an option, and not the default option.

This PR does not provide that optionality, and it must before it can be accepted.

Novtopro commented 3 weeks ago

@opoudjis @ronaldtse Yeah, I agree. We should provide an option to control it.

Novtopro commented 3 weeks ago

@opoudjis @ronaldtse I've updated the PR, could you please review it again?

ronaldtse commented 3 weeks ago

@Novtopro I don't understand why we need to sort the elements at all? We should retain the order of elements.

Whether an XML element's children elements is order-sensitive is only known to whoever defined that XML schema, i.e. in the definition of the class that inherits from Lutaml::Model::Serializable?

Novtopro commented 3 weeks ago

@ronaldtse

Maybe I was misled by the existing spec. Originally we wrote the spec like

  it "round-trips with xml-c14" do
    input = fixture.read

    serialized = described_class.from_xml(input)
    output = serialized.to_xml(
      pretty: true,
      declaration: true,
      encoding: "utf-8",
    )

    expect(output).to be_analogous_with(input)
  end

and I got errors like

CleanShot 2024-11-01 at 14 11 08@2x

The generated output is the same as the input except the order of the node is different. I noticed this issue and I thought that maybe we needed to update the transform XSL to sort the nodes on the same level if needed to make the comparison of XML pass like

it "round-trips with xml-c14" do
    input = Xml::C14n.format(fixture.read)

    serialized = described_class.from_xml(input)
    output = Xml::C14n.format(serialized.to_xml)

    expect(output).to be_analogous_with(input)
end

~But now, I realize that the failure may come from the missing mixed option and we should not transform the XML at all~

Novtopro commented 3 weeks ago

Okay, the failure does come from the missing mixed option and I'll leave the XSL untouched.