kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer
Other
1.17k stars 140 forks source link

Change autolink to allow more protocols, same as normal markdown links #379

Closed digitalmoksha closed 5 months ago

digitalmoksha commented 5 months ago

Currently when using the --unsafe option with comrak, normal markdown links allow all schemes except for javascript:, vbscript:, file:, and data:. So [smb](smb:///Volumes/shared/foo.pdf) generates <p><a href="smb:///Volumes/shared/foo.pdf">smb</a></p>

I would like autolinks to be able to allow the same schemes, using a new option --unsafe-autolinks

An example usage is in GitLab, where we filter to allow autolinking smb: links. So smb:///Volumes/shared/foo.pdf would autolink.

An --unsafe-autolinks option would allow the comrak user to filter all links based on the schemes they support.

gjtorikian commented 5 months ago

Would an additional option be necessary? Would the user of just —unsafe not expect these restrictions removed? Or does that go against the spec? Just trying to figure out whether another option is necessary.

digitalmoksha commented 5 months ago

Hmm, I was worried about exposing something that would make it "more unsafe", but if you're already using --unsafe, then you're already getting normal markdown links with those types of urls, and the <smb:///Volumes/shared/foo.pdf> syntax works as well.

Their spec does say

An extended url autolink will be recognised when one of the schemes http://, or https://, followed by a valid domain, then zero or more non-space non-< characters according to extended autolink path validation

and

An extended protocol autolink will be recognised when a protocol is recognised within any text node. Valid protocols are:

  • mailto:
  • xmpp:

So obviously it breaks that. And cmark-gfm -e autolink --unsafe with smb:///Volumes/shared/foo.pdf as input doesn't recognize it as a link.

The spec doesn't say anything about unsafe, but the help says Render raw HTML and dangerous URLs.

So I think you're right, we probably don't need the extra option.

kivikakk commented 5 months ago

Well, indeed; we might want an option to extend for which schemes autolinks will be made. The current code is uh, a bit weird — likely unnecessarily so — as a result of its direct port from C.

There's also of course a lot of battle-testing included in that weirdness, owing for the differences in safely (as in not-over-eagerly) recognising www.… links, http:// or https:// ones, as well as mailto: or xmpp: ones — what follows each set differs slightly, and trying to apply the same rules to all doesn't work. smb:// should work reasonably the same as http://-type ones, though, so that shouldn't be too much of an issue.

digitalmoksha commented 5 months ago

All the checks are good, just thinking about relaxing the scheme check.

You know, we've already got the relaxed-autolinks option, that allows detecting in brackets. And we have that option being passed into process_autolinks

What about using that to bypass the SCHEME check in url_match. Changing this

    let cond = |s: &&[u8]| size - i + rewind >= s.len() && &&contents[i - rewind..i] == s;
    if !SCHEMES.iter().any(cond) {
        return None;
    }

into

    if !relaxed_autolinks {
        let cond = |s: &&[u8]| size - i + rewind >= s.len() && &&contents[i - rewind..i] == s;
        if !SCHEMES.iter().any(cond) {
            return None;
        }
    }

My testing shows that works well. And then --unsafe will still do it's normal thing, allowing vbscript, etc through.

digitalmoksha commented 5 months ago

I went ahead and created https://github.com/kivikakk/comrak/pull/380 using the relaxed-autolinks

Of course we can change it if we need to.