thephpleague / commonmark

Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.
https://commonmark.thephpleague.com
BSD 3-Clause "New" or "Revised" License
2.73k stars 193 forks source link

Heading permalinks don't work if a document has the same title multiple times #531

Closed onigoetz closed 4 years ago

onigoetz commented 4 years ago

Version(s) affected: 1.5.3

Description

When using the Heading permalinks, if a document has the same title multiple times, it creates the same link, which won't work

How to reproduce

<?php

require "vendor/autoload.php";

use League\CommonMark\CommonMarkConverter;
use League\CommonMark\Environment;
use League\CommonMark\Extension\HeadingPermalink\HeadingPermalinkExtension;

// Obtain a pre-configured Environment with all the CommonMark parsers/renderers ready-to-go
$environment = Environment::createCommonMarkEnvironment();

// Add this extension
$environment->addExtension(new HeadingPermalinkExtension());

// Instantiate the converter engine and start converting some Markdown!
$converter = new CommonMarkConverter([], $environment);
echo $converter->convertToHtml("# Hello World!\n## Title\n## Title");

Outputs

<h1><a id="user-content-hello-world" href="#hello-world" name="hello-world" class="heading-permalink" aria-hidden="true" title="Permalink">¶</a>Hello World!</h1>
<h2><a id="user-content-title" href="#title" name="title" class="heading-permalink" aria-hidden="true" title="Permalink">¶</a>Title</h2>
<h2><a id="user-content-title" href="#title" name="title" class="heading-permalink" aria-hidden="true" title="Permalink">¶</a>Title</h2>

The second #title should be #title-1 or #title-2

I checked on babelmark; https://babelmark.github.io/?text=%23+title%0A%23%23+title and it's not clear what the take is on this.

I would love to use your implementation and throw away the ... thing ... I made for Daux : https://github.com/dauxio/daux.io/blob/master/libs/Format/HTML/ContentTypes/Markdown/TOC/Processor.php#L71

It's an all-in-one solution to add IDs to headers and create a Table of Contents. The separation you made between headings and TOC seems way cleaner and I'll adopt it once it can do multiple titles as well.

P.S. Thanks for maintaining this parser, it's a really neat implementation and easy to use :)

colinodell commented 4 years ago

I checked on babelmark; https://babelmark.github.io/?text=%23+title%0A%23%23+title and it's not clear what the take is on this.

Because this is a GFM feature, we should follow the example of the official GFM parser which does what you propose.

onigoetz commented 4 years ago

I feel the way you fixed it makes it mandatory to create a new instance of the parser for each document that will be parsed.

Otherwise the counters will already be more than zero at the start of the second document. Which can be a weird gotcha for people who don't expect it.

colinodell commented 4 years ago

Ah, good catch! Let me fix that...

markhalliwell commented 4 years ago

I think the original approach of UniqueSlugNormalizer was on the right track, but that it should have just used private static $usages instead so it would remain the same throughout the entire PHP request. In CMSes, multiple parsings could happen on a single page (i.e. main body, sidebar/blocks, comments, etc.).

That being said, it probably wouldn't hurt to maybe add some static CRUD around it too in the event that the CMS may need to store the current usage, then reset it as needed (i.e. thinking of maybe sending something like emails or generating PDFs).

markhalliwell commented 4 years ago

Although, the latter could maybe be subverted by maybe creating a new heading_permalink/unique_slugs option that can take multiple values:

'' - None, uses the normal SlugNormalizer 'document' - Unique to new documents (resets UniqueSlugNormalizer on DocumentParsedEvent) 'environment' - Unique to new environments (resets UniqueSlugNormalizer on extension construction or setConfiguration?) 'manual' - UniqueSlugNormalizer remembers everything (i.e. unique to current PHP thread), requires manual reset

colinodell commented 4 years ago

I'd be in favor of adding that kind of option, but with only two choices: 'document' (default) or 'environment'.

I don't think we should do '' because it doesn't align with GFM's behavior.

I'm not sure how I feel about 'manual'. I don't like that it requires global/static storage where two completely different sets of Environments have the ability to impact each other. I can see why this might be useful - perhaps you have oneEnvironment` rendering a blog post and another (more-restricted one) to render user comments, and you don't want any collisions in slugs. But in that case, I think I'd rather see some kind of slug prefix option.

So building off your idea, I'd propose adding two options:

We'd bring back something like the UniqueSlugNormalizer and it would handle:

We'd expose an instance of that class via some getter method on the HeadingPermalinkExtension.

What do you think?

markhalliwell commented 4 years ago

I'm fine with just document and environment being the only two values/scenarios as long as the normalizer has some sort of CRUD in place. This would allow a CMS to implement a "manual" implementation easily enough without having to recreate a new environment each time it spits out an email or something with the same content.

markhalliwell commented 4 years ago

I suppose I should close this issue and open another once since there's already code that addresses this specific issue committed.