sebastianbergmann / comparator

Provides the functionality to compare PHP values for equality.
BSD 3-Clause "New" or "Revised" License
6.97k stars 67 forks source link

XML encoding is lost on canonicalization #70

Open b1rdex opened 5 years ago

b1rdex commented 5 years ago

DOMNode::C14N() call removes encoding from XML somehow. This is bad for comparison of non ascii xmls, because it makes them unreadable. Please see example: https://3v4l.org/FvCAs

sebastianbergmann commented 5 years ago

Correct me if I am wrong, but canonicalization, and therefore the call to DOMNode::C14N(), is optional. If it is "bad" for you then do not enable it.

b1rdex commented 5 years ago

I can't say it's bad, it just does strange thing that leads to not readable output. How am I supposed to read this comparison diff?

PHPUnit 8.1.3 by Sebastian Bergmann and contributors.

..............................F

Time: 8.54 seconds, Memory: 42.00 MB

There was 1 failure:

1) ApiDistributorsAuthTest::testAuthOfNotDistributor
Failed asserting that two DOM documents are equal.
--- Expected
+++ Actual
@@ @@
 <distributors xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <auth>
     <result>error</result>
-    <message>&#x41F;&#x43E;&#x43B;&#x44C;&#x437;&#x43E;&#x432;&#x430;&#x442;&#x435;&#x43B;&#x44C; &#x43D;&#x435; &#x44F;&#x432;&#x43B;&#x44F;&#x435;&#x442;&#x441;&#x44F; &#x43F;&#x443;&#x43D;&#x43A;&#x442;&#x43E;&#x43C; &#x432;&#x44B;&#x434;&#x430;&#x447;&#x438;</message>
+    <message>&#x41D;&#x435;&#x432;&#x435;&#x440;&#x43D;&#x44B;&#x439; &#x43B;&#x43E;&#x433;&#x438;&#x43D; &#x438;&#x43B;&#x438; &#x43F;&#x430;&#x440;&#x43E;&#x43B;&#x44C;</message>
   </auth>
 </distributors>
b1rdex commented 5 years ago

AFAIK canonical XML is always UTF-8 so my simple fix in example should be OK for any XML.

sebastianbergmann commented 5 years ago

Can you please send a pull request with your fix? Thanks!