Closed tripodsan closed 3 years ago
I just realized that the same happens in 0.5.4
when the inner text is sourrounded with spaces:
const doc = `
| links |
|-------|
|<p><a href="https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be"> https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be </a></p>|
`;
I believe actual and expected in your issue are the same thing?
I believe actual and expected in your issue are the same thing?
oh, I think I copy pasted the wrong snipped.... one sec :-)
Btw, GH doesn’t render it nicely either:
Take this, to disambiguate which link “wins”:
| links |
|-------|
|<p><a href="#">https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be</a></p>|
Gives:
<table role="table">
<thead>
<tr>
<th>links</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<p>
<a href="#"></a>
<a rel="nofollow" href="https://www.youtube.com/watch?v=kd2i50J9MZI&#x26;feature=youtu.be">https://www.youtube.com/watch?v=kd2i50J9MZI&#x26;feature=youtu.be</a>
</p>
</td>
</tr>
</tbody>
</table>
Note that the autolink literal is still used.
Rendered here:
links |
---|
https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be |
The same is true for gists btw
So, what would the expected output be for you? I know that this is a change, but gh also has the two links, and exactly matching how gh sanitizes their html isn't really something for this project either 🤔
In my example, the link href and the link text are the same. so this can be considered as an already autolinked HTML result. in this case, I think the gfm should not autolink it again.
so this can be considered as an already autolinked HTML result. in this case, I think the gfm should not autolink it again.
But this is not how GFM handles it: the autolink literal is linked. They do have two <a>
s. So I don’t see why this extension should do that?
For the actual GFM output: given that they sanitize the output, which is done by parsing the output of markdown as HTML and then cleaning it, and given that HTML parsing closes an open anchor element when another opening anchor tag is opened:
d = document.createElement('div')
d.innerHTML = '<a><a></a></a>'
console.log(d.innerHTML)
Yields:
<a></a><a></a>
I think the current behavior is correct: if micromark/remark outputs '<a><a></a></a>'
and GH outputs <a></a><a></a>
, both are seen as the same thing in browsers.
but when add the HTML from above:
<p><a href="https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be">https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be</a></p>
https://www.youtube.com/watch?v=kd2i50J9MZI&feature=youtu.be
you can see that github doesn't add an additional <a>
:
So ideally, the gfm-autolink extension detects that this is a link inside an <a>
with the same href, and doesn't generate a link node.
That’s a different case, and micromark/remark (most probably) handle that correctly, see: https://gist.github.com/wooorm/41e4ba5779fc89b78ece9d6a67faaa41.
Because p
(or div
) is a known “block” tag (ref: https://spec.commonmark.org/0.29/#html-blocks), the rest of the line is seen as raw HTML: emphasis and such won‘t work there either. Whereas in a table cell, or by using a non-block tag such as a
(or span
), the parser works differently, and “inline” things such as autolink literals or emphasis do work.
hmm... you're correct.
Btw, there’s definitely a bug with autolink literals, which is very weird, and I’m working on here: https://github.com/micromark/micromark-extension-gfm-autolink-literal/pull/6#issuecomment-770957354.
But for this specific case, mircomark/mdast/remark are as close we can get to how GH parses, I think
Closing this. Meanwhile, there are new releases for this project and the corresponding mdast utility to match GH!
don't know if this is really a bug, but at least a regression.
with
0.5.4
, a link inside html was parsed as a text node:but with version
0.5.5
, it is parsed into a link node:As a consumer of the mdast, it is very difficult to remove the link inside the
<a>
.... I don't know what the correct behaviour should be...