thephpleague / commonmark

Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.
https://commonmark.thephpleague.com
BSD 3-Clause "New" or "Revised" License
2.75k stars 194 forks source link

Autolinks - URls within HTML produces invalid HTML #1045

Closed edent closed 2 months ago

edent commented 2 months ago

Version(s) affected

2.5

Description

In this HTML, the autolinker adds a nested <a> which is invalid.

HTML <a href="https://example.com">Visit https://example.com</a>

Becomes

<p>HTML <a href="https://example.com">Visit <a href="https://example.com">https://example.com</a></a></p>

I would expect the autolinker to ignore the URl inside the anchor. For example, this produces correct HTML:

[Visit https://example.com](https://example.com)

How to reproduce

As per https://commonmark.thephpleague.com/2.5/extensions/autolinks/

require_once 'vendor/autoload.php';

use League\CommonMark\Environment\Environment;
use League\CommonMark\Extension\Autolink\AutolinkExtension;
use League\CommonMark\Extension\CommonMark\CommonMarkCoreExtension;
use League\CommonMark\MarkdownConverter;

// Define your configuration, if needed
$config = [];

// Configure the Environment with all the CommonMark parsers/renderers
$environment = new Environment($config);
$environment->addExtension(new CommonMarkCoreExtension());

// Add this extension
$environment->addExtension(new AutolinkExtension());

// Instantiate the converter engine and start converting some Markdown!
$converter = new MarkdownConverter($environment);

echo $converter->convert('Correct [Visit https://example.com](https://example.com)');
echo $converter->convert('Broken <a href="https://example.com">Visit https://example.com</a>');
colinodell commented 2 months ago

The Autolink extension follows the GFM spec which unfortunately doesn't make exceptions for autolinks found inside of HTML. You can see here that the official GFM parser behaves somewhat similarly, giving this output:

<p>
  HTML <a href="https://example.com" rel="nofollow">Visit 
  </a><a href="https://example.com" rel="nofollow">https://example.com
  </a>
</p>

So because this behavior matches both the spec and the official GFM parser I'm inclined to keep this as-is for now. But if they change this upstream in their specification I'd be glad to make this change here to remain aligned with them!

Thanks for the detailed report, though!

edent commented 2 months ago

Thanks - that's pretty reasonable.