micromark / micromark-extension-gfm-autolink-literal

micromark extension to support GFM autolink literals
https://unifiedjs.com
MIT License
7 stars 3 forks source link

Parser crashes when there is a backspace control character in autolink domain #2

Closed ChristianMurphy closed 3 years ago

ChristianMurphy commented 3 years ago

Subject of the issue

Parser crashes when there is a backspace control character in autolink

Your environment

Steps to reproduce

const micromark = require("micromark/lib");
const gfmSyntax = require("micromark-extension-gfm");

micromark("www.a", "utf-8", { extensions: [gfmSyntax()] });

Expected behavior

Text should display as

www.a

Actual behavior

expected non-empty token (`literalAutolinkDomain`)
wooorm commented 3 years ago

This seems to be about the actual replacement character, not about the c0 character? (as I get the error on the replacement character) 🤔

wooorm commented 3 years ago

OK, I can fix it, but uhm: www.點看.com is www.點看.com here.

Whereas the spec says:

A valid domain consists of segments of alphanumeric characters, underscores (_) and hyphens (-) separated by periods (.)....

alphanumeric is typically a-zA-Z0-9 in markdown. But... What does GitHub allow? 🤔

wooorm commented 3 years ago

And more importantly, what should this extension?

wooorm commented 3 years ago

micromark includes info on all unicode-whitespace and unicode-punctuation. What if we allow all other characters?

wooorm commented 3 years ago

(except control characters ofc)

wooorm commented 3 years ago

wwwtf?

www. (space)

www.!

www."

www.#

www.$

www.%

www.&

www.'

www.(

www.)

www.*

www.+

www.,

www.-

www.

www..

www./

www.:

www.l

www.<

www.=

www.>

www.?

www.@

www.[

www.\

www.]

www.^

www._

www.`

www.{

www.|

www.}

www.~

wooorm commented 3 years ago

wwwtf 2?

www.a (space)

www.a!

www.a"

www.a#

www.a$

www.a%

www.a&

www.a'

www.a(

www.a)

www.a*

www.a+

www.a,

www.a-

www.a

www.a.

www.a/

www.a:

www.al

www.a<

www.a=

www.a>

www.a?

www.a@

www.a[

www.a\

www.a]

www.a^

www.a_

www.a`

www.a{

www.a|

www.a}

www.a~

wooorm commented 3 years ago

well, that was complex.