micromark / micromark-extension-gfm-autolink-literal

micromark extension to support GFM autolink literals
https://unifiedjs.com
MIT License
7 stars 3 forks source link
autolink gfm github literal micromark micromark-extension raw url

micromark-extension-gfm-autolink-literal

Build Coverage Downloads Size Sponsors Backers Chat

micromark extensions to support GFM literal autolinks.

Contents

What is this?

This package contains extensions that add support for the extra autolink syntax enabled by GFM to micromark.

GitHub employs different algorithms to autolink: one at parse time and one at transform time (similar to how @mentions are done at transform time). This difference can be observed because character references and escapes are handled differently. But also because issues/PRs/comments omit (perhaps by accident?) the second algorithm for www., http://, and https:// links (but not for email links).

As this is a syntax extension, it focuses on the first algorithm. The second algorithm is performed by mdast-util-gfm-autolink-literal. The html part of this micromark extension does not operate on an AST and hence can’t perform the second algorithm.

The implementation of autolink literal on github.com is currently buggy. The bugs have been reported on cmark-gfm. This micromark extension matches github.com except for its bugs.

When to use this

This project is useful when you want to support autolink literals in markdown.

You can use these extensions when you are working with micromark. To support all GFM features, use micromark-extension-gfm instead.

When you need a syntax tree, combine this package with mdast-util-gfm-autolink-literal.

All these packages are used in remark-gfm, which focusses on making it easier to transform content by abstracting these internals away.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install micromark-extension-gfm-autolink-literal

In Deno with esm.sh:

import {gfmAutolinkLiteral, gfmAutolinkLiteralHtml} from 'https://esm.sh/micromark-extension-gfm-autolink-literal@2'

In browsers with esm.sh:

<script type="module">
  import {gfmAutolinkLiteral, gfmAutolinkLiteralHtml} from 'https://esm.sh/micromark-extension-gfm-autolink-literal@2?bundle'
</script>

Use

import {micromark} from 'micromark'
import {
  gfmAutolinkLiteral,
  gfmAutolinkLiteralHtml
} from 'micromark-extension-gfm-autolink-literal'

const output = micromark('Just a URL: www.example.com.', {
  extensions: [gfmAutolinkLiteral()],
  htmlExtensions: [gfmAutolinkLiteralHtml()]
})

console.log(output)

Yields:

<p>Just a URL: <a href="http://www.example.com">www.example.com</a>.</p>

API

This package exports the identifiers gfmAutolinkLiteral and gfmAutolinkLiteralHtml. There is no default export.

The export map supports the development condition. Run node --conditions development module.js to get instrumented dev code. Without this condition, production code is loaded.

gfmAutolinkLiteral()

Create an extension for micromark to support GitHub autolink literal syntax.

Parameters

Extension for micromark that can be passed in extensions to enable GFM autolink literal syntax (Extension).

gfmAutolinkLiteralHtml()

Create an HTML extension for micromark to support GitHub autolink literal when serializing to HTML.

Parameters

Extension for micromark that can be passed in htmlExtensions to support GitHub autolink literal when serializing to HTML (HtmlExtension).

Bugs

GitHub’s own algorithm to parse autolink literals contains three bugs. A smaller bug is left unfixed in this project for consistency. Two main bugs are not present in this project. The issues relating to autolink literals are:

Authoring

It is recommended to use labels, either with a resource or a definition, instead of autolink literals, as those allow relative URLs and descriptive text to explain the URL in prose.

HTML

GFM autolink literals relate to the <a> element in HTML. See § 4.5.1 The a element in the HTML spec for more info. When an email autolink is used, the string mailto: is prepended when generating the href attribute of the hyperlink. When a www autolink is used, the string http:// is prepended.

CSS

As hyperlinks are the fundamental thing that makes the web, you will most definitely have CSS for a elements already. The same CSS can be used for autolink literals, too.

GitHub itself does not apply interesting CSS to autolink literals. For any link, it currently (June 2022) uses:

a {
  background-color: transparent;
  color: #58a6ff;
  text-decoration: none;
}

a:active,
a:hover {
  outline-width: 0;
}

a:hover {
  text-decoration: underline;
}

a:not([href]) {
  color: inherit;
  text-decoration: none;
}

Syntax

Autolink literals form with, roughly, the following BNF:

gfmAutolinkLiteral ::= gfmProtocolAutolink / gfmWwwAutolink / gfmEmailAutolink

; Restriction: the code before must be `wwwAutolinkBefore`.
; Restriction: the code after `.` must not be eof.
wwwAutolink ::= 3("w" / "W") "." [domain [path]]
wwwAutolinkBefore ::= eof / eol / spaceOrTab / "(" / "*" / "_" / "[" / "]" / "~"

; Restriction: the code before must be `httpAutolinkBefore`.
; Restriction: the code after the protocol must be `httpAutolinkProtocolAfter`.
httpAutolink ::= ("h" / "H") 2("t" / "T") ("p" / "P") ["s" / "S"] ":" 2"/" domain [path]
httpAutolinkBefore ::= byte - asciiAlpha
httpAutolinkProtocolAfter ::= byte - eof - eol - asciiControl - unicodeWhitespace - unicodePunctuation

; Restriction: the code before must be `emailAutolinkBefore`.
; Restriction: `asciiDigit` may not occur in the last label part of the label.
emailAutolink ::= 1*("+" / "-" / "." / "_" / asciiAlphanumeric) "@" 1*(1*labelSegment labelDotCont) 1*labelSegment
emailAutolinkBefore ::= byte - asciiAlpha - "/"

; Restriction: `_` may not occur in the last two domain parts.
domain ::= 1*(urlAmptCont / domainPunctCont / "-" / byte - eof - asciiControl - unicodeWhitespace - unicodePunctuation)
; Restriction: must not be followed by `punct`.
domainPunctCont ::= "." / "_"
; Restriction: must not be followed by `charRef`.
urlAmptCont ::= "&"

; Restriction: a counter `balance = 0` is increased for every `(`, and decreased for every `)`.
; Restriction: `)` must not be `parenAtEnd`.
path ::= 1*(urlAmptCont / pathPunctuationCont / "(" / ")" / byte - eof - eol - spaceOrTab)
; Restriction: must not be followed by `punct`.
pathPunctuationCont ::= trailingPunctuation - "<"
; Restriction: must be followed by `punct` and `balance` must be less than `0`.
parenAtEnd ::= ")"

labelSegment ::= labelDashUnderscoreCont / asciiAlpha / asciiDigit
; Restriction: if followed by `punct`, the whole email autolink is invalid.
labelDashUnderscoreCont ::= "-" / "_"
; Restriction: must not be followed by `punct`.
labelDotCont ::= "."

punct ::= *trailingPunctuation ( byte - eof - eol - spaceOrTab - "<" )
charRef ::= *asciiAlpha ";" pathEnd
trailingPunctuation ::= "!" / "\"" / "'" / ")" / "*" / "," / "." / ":" / ";" / "<" / "?" / "_" / "~"

The grammar for GFM autolink literal is very relaxed: basically anything except for whitespace is allowed after a prefix. To use whitespace characters and otherwise impossible characters, in URLs, you can use percent encoding:

https://example.com/alpha%20bravo

Yields:

<p><a href="https://example.com/alpha%20bravo">https://example.com/alpha%20bravo</a></p>

There are several cases where incorrect encoding of URLs would, in other languages, result in a parse error. In markdown, there are no errors, and URLs are normalized. In addition, many characters are percent encoded (sanitizeUri). For example:

www.a👍b%

Yields:

<p><a href="http://www.a%F0%9F%91%8Db%25">www.a👍b%</a></p>

There is a big difference between how www and protocol literals work compared to how email literals work. The first two are done when parsing, and work like anything else in markdown. But email literals are handled afterwards: when everything is parsed, we look back at the events to figure out if there were email addresses. This particularly affects how they interleave with character escapes and character references.

Types

This package is fully typed with TypeScript. It exports no additional types.

Compatibility

Projects maintained by the unified collective are compatible with maintained versions of Node.js.

When we cut a new major release, we drop support for unmaintained versions of Node. This means we try to keep the current release line, micromark-extension-gfm-autolink-literal@^2, compatible with Node.js 16.

This package works with micromark version 3 and later.

Security

This package is safe. Unlike other links in CommonMark, which allow arbitrary protocols, this construct always produces safe links.

Related

Contribute

See contributing.md in micromark/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer