ivopetkov / html5-dom-document-php

A better HTML5 parser for PHP.
MIT License
599 stars 40 forks source link

Links with umlauts #51

Closed SarahTrees closed 2 years ago

SarahTrees commented 2 years ago

We use umlauts in links. This looks very good in the browser and Google also shows the umlauts perfectly in the search preview.

When I read in an HTML fragment with umlauts and then output it again, they are replaced by replacement characters. How can I correct this when reading/outputting? Is there a parameter?

$html = new IvoPetkov\HTML5DOMDocument();
$html->loadHTML('<a href="Ökologie.html" title="Wechselbeziehungen zwischen Lebewesen und ihrer Umwelt">Ökologie - Wechselbeziehungen</a>');

$html->querySelector('body')->innerHTML;

result: <a href="%C3%96kologie.html" title="Wechselbeziehungen zwischen Lebewesen und ihrer Umwelt">Ökologie - Wechselbeziehungen</a>

The umlaut Ö in the link text is correct. The umlaut Ö in the href was replaced by %C3%96.

SarahTrees commented 2 years ago

Just discovered that someone already had the problem and wrote a workarount for it.

47