Closed rjsparks closed 2 years ago
(this issue should probably be migrated to bibxml-service.
Transferred and acknowledged.
https://github.com/ietf-ribose/bibxml-service/issues/234 will address this issue.
We can canonicalize returned XML, but I’d like to learn more about the new use case being described.
For example, if there is XML diffing being done, isn’t it the responsibility of the differ to ensure all sources being diffed are canonicalized using the same method, rather than relying on API returning canonicalized XML and hoping it’s canonicalized with the same set of options?
For example, the differ in xml2rfc path testing script (created for transitional purposes) now does canonicalization as follows: https://github.com/ietf-ribose/xml2rfc-mapping-convertor/commit/6564383e0a6363f59a2e44fc7c22c641f5e70921
Thanks! Can we have a sample to see how the canonicalized content looks?
While attributes and unordered-element order doesn't matter to XML parsers, it matters a lot to text differencing engines.
Text differencing engines should really not be relied upon for structured data content, but I recognise that in certain conditions, e.g. git, it is unavoidable.
Yes: you can use the latest version of test_paths.py
script, which produces a report with canonicalized diffs like bibxml-report.html.zip.
Text differencing engines should really not be relied upon for structured data content, but I recognise that in certain conditions, e.g. git, it is unavoidable.
Well, as you see in above report, our diffs with xml2rfc are very large since we pretty-print XML, while xml2rfc tools outputs un-indented lines. Canonicalizing an XML string with lxml.etree.canonicalize()
appears to leave indentation as is, so every indented line is a difference.
It’s just another illustration that a diffing layer shouldn’t trust service to return a normal form and do extra mangling, going beyond the c14n spec, on its own to ensure both representations are truly as equivalent as possible.
It’s just another illustration that a diffing layer shouldn’t trust service to return a normal form and do extra mangling
I agree. A comparison can only work if the canonicalisation is run on both comparands.
It’s just another illustration that a diffing layer shouldn’t trust service to return a normal form and do extra mangling
I agree. A comparison can only work if the canonicalisation is run on both comparands.
Yes, it can ensure it runs the same canonicalization algorithm with the same parameters, and do any extra normalization beyond XML c14n spec if required.
The service canonicalizes XML now, and the order of elements should be consistent (more details: https://github.com/ietf-ribose/bibxml-service/issues/239#issuecomment-1193069048)
See https://mailarchive.ietf.org/arch/msg/tools-discuss/yRqq-psSVnB-qoNbICFvzZQDmIo.
While attributes and unordered-element order doesn't matter to XML parsers, it matters a lot to text differencing engines.
The output should be stable (repeated fetches should return identical bits, not seemingly randomly reordered ones).