thephpleague / html-to-markdown

Convert HTML to Markdown with PHP
MIT License
1.77k stars 204 forks source link

autolinks and url-/html-encoded strings #248

Open klkvsk opened 7 months ago

klkvsk commented 7 months ago

Version(s) affected

5.1.1/5.2.x

Description

When using 'use_autolinks' => true it can fail on cases where link's href and text are encoded differently using:

The strict comparison that autolinks check uses should be softened by converation to a common form. Also we should probably make that HTML encodings are decoded for markdown output, while URL encodings are left as is, is this right?

How to reproduce

$converter = new HtmlConverter();
$converter->setOptions([ 'use_autolinks' => true ]);
echo $converter->convert('<a href="https://example.com/?a=1&b=foo%2Fbar">https://example.com/?a=1&amp;b=foo/bar</a>');
// [https://example.com/?a=1&amp;b=foo/bar](https://example.com/?a=1&b=foo%2Fbar)