sebastianbergmann / comparator

Provides the functionality to compare PHP values for equality.
BSD 3-Clause "New" or "Revised" License
6.99k stars 68 forks source link

Add xml declaration to force utf-8 #71

Open b1rdex opened 5 years ago

b1rdex commented 5 years ago

See #70

codecov[bot] commented 5 years ago

Codecov Report

Merging #71 into master will decrease coverage by 0.65%. The diff coverage is 80.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #71      +/-   ##
============================================
- Coverage     99.30%   98.65%   -0.66%     
- Complexity      125      130       +5     
============================================
  Files            15       15              
  Lines           288      297       +9     
============================================
+ Hits            286      293       +7     
- Misses            2        4       +2     
Impacted Files Coverage Δ Complexity Δ
src/DOMNodeComparator.php 93.10% <80.00%> (-6.90%) 15.00 <5.00> (+5.00) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 988baa0...f315e2d. Read the comment docs.

sebastianbergmann commented 5 years ago

@theseer Does this make sense to you?

theseer commented 5 years ago

First off: I'm not sure whether C14N serializes the DOM-Subtree at hand using UTF-8 because it being the internal encoding within libXML or using the original encoding as specified for the containing document.

If it is indeed ignoring the original ecoding, the patch is technically correct but should be superfluous as UTF-8 is the default. It's generally considered nice to specify it anyhow, though.

If C14N is not using UTF-8 but honors the original document encoding, $node->ownerDocument->actualEncoding should be used instead of a hardcoded UTF-8. There is, to my knowledge, no way to specify the wanted target-encoding with C14N.

b1rdex commented 5 years ago

As I remember XML Canonicalisation spec has utf-8 as requirement.

On 9 May 2019, at 05:02, Arne Blankerts notifications@github.com wrote:

First off: I'm not sure whether C14N serializes the DOM-Subtree at hand using UTF-8 because it being the internal encoding within libXML or using the original encoding as specified for the containing document.

If it is indeed ignoring the original ecoding, the patch is technically correct but should be superfluous as UTF-8 is the default. It's generally considered nice to specify it anyhow, though.

If C14N is not using UTF-8 but honors the original document encoding, $node->ownerDocument->actualEncoding should be used instead of a hardcoded UTF-8. There is, to my knowledge, no way to specify the wanted target-encoding with C14N.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sebastianbergmann/comparator/pull/71#issuecomment-490610558, or mute the thread https://github.com/notifications/unsubscribe-auth/AACMMF6D3QZS2S3OK35ADITPUMPVBANCNFSM4HLPBY5Q.

b1rdex commented 5 years ago

OK, I changed it to $node->ownerDocument->encoding. I used encoding as documentation states that actualEncoding is deprecated and it's a readonly equivalent to encoding.

Here is my original test example with ->encoding, it's working same as utf-8: https://3v4l.org/YfMoq

@sebastianbergmann we can decouple XML canonicalisation out of comparator to write test on it. Should I do it?

b1rdex commented 5 years ago

I had to add Elvis to fix tests. Apparently, encoding isn't always present.

eclipxe13 commented 4 years ago

@theseer according to https://www.w3.org/TR/xml-c14n11/ it says: "The document is encoded in UTF-8".

@b1rdex:

b1rdex commented 4 years ago
  • may I suggest using $document = new DOMDocument('1.0', $node->ownerDocument->encoding ?: 'UTF-8');

That doesn't work. I tried

    /**
     * Tries to get the encoding from DOMNode or falls back to UTF-8
     */
    private function getNodeEncoding(DOMNode $node): string
    {
        $encoding = '';
        if ($node->ownerDocument) {
            $encoding = $node->ownerDocument->xmlEncoding ?: $node->ownerDocument->encoding;
        } elseif ($node instanceof DOMEntity) {
            $encoding = $node->encoding;
        }

        return $encoding ?: 'UTF-8';
    }

I even tried just setting it to UTF-8 directly. Nothing changes until I add a <?xml version="1.0" encoding header.

@eclipxe13 you can play with it yourself https://3v4l.org/FvCAs

I'll update the PR w/ the test shortly.

sebastianbergmann commented 4 years ago

Please avoid useless commit messages such as Update DOMNodeComparator.php.

b1rdex commented 4 years ago

Please avoid useless commit messages such as Update DOMNodeComparator.php.

@sebastianbergmann I've force pushed the squashed branch. Also, there is an option on GitHub to squash commits before merging.

eclipxe13 commented 4 years ago

@b1rdex I see your point. Nothing changes until I add a <?xml version="1.0" encoding="ENCODING" ?> header.

That is because loadXml() reset the document encoding, even when it was set on construction.

This code also works:

$document = new DOMDocument();
@$document->loadXml(node->C14N());

// at this point $document->xmlEncoding is null, it is also read-only and has the undesired encoding

$document->encoding = $this->getNodeEncoding($node);

// at this point you have the output of `saveXml()` as you expect
b1rdex commented 4 years ago

Feel free to send the update. Lets check it against the test I provided.

Отправлено с iPhone

12 июля 2020 г., в 09:47, Carlos C Soto notifications@github.com написал(а):

 @b1rdex I see your point. Nothing changes until I add a <?xml version="1.0" encoding="ENCODING" ?> header.

That is because loadXml() reset the document encoding, even when it was set on construction.

This code also works:

$document = new DOMDocument(); @$document->loadXml(node->C14N());

// at this point $document->xmlEncoding is null, it is also read-only and has the undesired encoding

$document->encoding = $this->getNodeEncoding($node);

// at this point you have the output of saveXml() as you expect — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.