syntax-tree / mdast-util-to-markdown

mdast utility to serialize markdown
http://unifiedjs.com
MIT License
100 stars 20 forks source link

Underscores escaped in link destination urls #8

Closed tripodsan closed 4 years ago

tripodsan commented 4 years ago

Subject of the issue

Having a link with an url that contains a _ will escape it.

Steps to reproduce

For example, the following tree:

├─38 paragraph[3] (77:1-77:229, 4501-4729)
│   └─0 link[1] (77:163-77:229, 4663-4729)
│       │ title: null
│       │ url: "https://www.youtube.com/channel/UCM1CbBpxRg_-FOutzAXjuLg"
│       └─0 text "Jovely" (77:164-77:170, 4664-4670)

will produce this markdown:

[Jovely](https://www.youtube.com/channel/UCM1CbBpxRg\_-FOutzAXjuLg)

Other examples:

[@\_thelustlist\_](https://www.instagram.com/\_thelustlist\_/)

Test markdown in github:

Expected behavior

The _ in the URL should not be escaped.

Actual behavior

The _ in the URL is escaped.

tripodsan commented 4 years ago

just realized that the escaping is correct. sorry for the noise.

wooorm commented 4 years ago

:+1:

Escaping can be better, but escaping markdown is rather complex, so I don’t easily see how this one would work. The escaping here does not cause any harm, as the escapes in destinations work too.

There was a similar issue recently but I couldn’t find it.

tripodsan commented 4 years ago

so I don’t easily see how this one would work.

right.... I think in the previous remark-parser, it didn't escape the _ in link urls.

wooorm commented 4 years ago

That is correct. But the previous compiler had bugs, and this one has less bugs ;)

tripodsan commented 4 years ago

@wooorm there is also escaping of &, which is also unexpected. what is the exact reason to escape anything within the ( ) of the link url? (except maybe a )) ?

wooorm commented 4 years ago

https://github.com/syntax-tree/mdast-util-to-markdown/blob/5fa790ee4cdead2a6c11b6a6f73c59eb0f9ca295/lib/unsafe.js#L40-L41

tripodsan commented 4 years ago

maybe treating the link URL as phrasing is wrong :-)

wooorm commented 4 years ago

Character references and -escapes work in link destinations, it’s intended behavior

tripodsan commented 4 years ago

Character references and -escapes work in link destinations, it’s intended behavior

so you mean that this md:

[foo][https://example.com/test".html]

would create a link url with https://example.com/test".html when parsed?

wooorm commented 4 years ago

yes.

We’re serializing markdown here, so if someone had actually escaped that:

[foo](https://example.com/test".html)
[foo](https://example.com/test\".html)

...we need to make sure it roundtrips.