Closed ilyaigpetrov closed 4 years ago
Hi there! Pretty interesting, but: the current behavior is compliant to the commonmark reference impeimentation: https://spec.commonmark.org/dingus/?text=%5Bexample%5D(https%3A%2F%2Fя.рф%2Fя). Most markdown parsers behave worse or the same as this project. The downside I see of implementing punycode is that it is big: micromark is ±13kb, punycode is 130kb. However. this is something that could be added to commonmark if you get with you. So, I’ll refer you to their Discourse.
What's wrong with using new URL(idnUrl).href
for encoding, not an external Punycode library?
Maybe I can write a plugin for remark that handles it properly? Can we add an option to disable percent-encoding in this syntax tree transformer?
That is not the main point: the main point is that other markdown parsers do it like we do it.
But: new URL(idnUrl).href
crashes on certain URLs and markdown never crashes.
Btw: I believe you can do this all yourself in a plugin before mdast-util-to-hast
. See Creating a plugin and Tree traversal
I've created a feature request to commonmark here: https://talk.commonmark.org/t/feature-support-of-idns-internationalized-domain-names-punycode/3672 -- Crissov says it is implied that punycode must work.
Crissov says it is implied that punycode must work.
That’s not how I read that comment. Crissov links to the same Babelmark I linked to.
No, @wooorm:
1) "Markdig and markdown-it convert the link destination as desired" (punycode is desired).
2) Then he links to babelmark as a prove of his words (markdown-it and markdig there use punycode: https://xn--41a.xn--p1ai/%D1%8F
).
3) "I’m not sure this needs to be specified explicitly" (the desired behavior, i.e. punycode, needs to be specified explicitly).
Anyway I've asked him directly on the forum --, please, subscribe to the topic so I don't have to repost it here.
I read it differently:
- "Markdig and markdown-it convert the link destination as desired" (punycode is desired).
“There are some parsers that do what you (@ilyaigpetrov) desire. (but most don’t)”
- Then he links to babelmark as a prove of his words (markdown-it and markdig there use punycode: https://xn--41a.xn--p1ai/%D1%8F).
The same I showed: some parsers do what you want, most parsers do it differently, unified/remark/micromark does what the commonmark.js reference parser does and GFM does.
Why is this so important? Whether punycode is used or not, it works (taking a live example):
[Unicode](https://räksmörgås.josefsson.org), [Punycode](https://xn--rksmrgs-5wao1o.josefsson.org).
Example w/ unified: https://runkit.com/wooorm/5f91431c66c630001a6f7d71
Subject of the feature
Support Internationalized Domain Names (IDNs) in urls of links.
Problem
If I use IDNs in my markdown, e.g.
[example](https://я.рф/я)
then I getя.рф
part percent-encoded (instead of encoding it with Punycode), e.g. I gethttps://%D1%8F.%D1%80%D1%84/%D1%8F
. Such links become just broken.Expected behavior
https://я.рф/я
must be encoded ashttps://xn--41a.xn--p1ai/%D1%8F
.Alternatives
Instead of using
mdurl/encode
I suggest usingnew URL(node.url).href
. I warn you that I'm not sure what is the purpose ofmdurl/encode
and why it was created and used here.