syntax-tree / mdast-util-gfm-autolink-literal

mdast extension to parse and serialize GFM autolink literals
https://unifiedjs.com
MIT License
8 stars 6 forks source link

Links detected in raw broken markdown block compared to GitHub.com #4

Closed SamyPesse closed 3 years ago

SamyPesse commented 3 years ago

Initial checklist

Affected packages and versions

mdast-util-gfm-autolink-literal@1.0.0

Link to runnable example

No response

Steps to reproduce

Considering this broken markdown string (extracted from one of our user content):

\* \*\*Memory:\*\* \[Vengeance® Series 16GB \\(2x8GB\\) DDR4 SODIMM 2400MHz CL16 Memory Kit CMSX16GX4M2A2400C16](https://www.corsair.com/us/en/Categories/Products/Memory/Laptop-and-Notebook-Memory/Vengeance%C2%AE-Series-16GB-%282x8GB%29-DDR4-SODIMM-2400MHz-CL16-Memory-Kit/p/CMSX16GX4M2A2400C16#tab-overview)  
\* \*\*Disk:\*\* M.2 NVMe \[Samsung 970 EVO 250GB \\(MZ-V7S250B/AM\\)](https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-970-evo-plus-nvme-m-2-250gb-mz-v7s250b-am/) and M.2 SSD \[Transcend 64GB SATA III 6 Gb/s MTS800](https://www.transcend-info.com/Embedded/Products/No-803)

Using the following JS code:

unified()
    .use(remarkParse)
    .use(remarkGfm)
    .use(() => {
        return (tree) => {
            console.log(JSON.stringify(tree, null, 2));
            return tree;
        };
    })

it results in the following tree:

{
    "type": "root",
    "children": [
    {
        "type": "paragraph",
        "children": [
        {
            "type": "text",
            "value": "* **Memory:** [Vengeance® Series 16GB \\(2x8GB\\) DDR4 SODIMM 2400MHz CL16 Memory Kit CMSX16GX4M2A2400C16]("
        },
        {
            "type": "link",
            "title": null,
            "url": "https://www.corsair.com/us/en/Categories/Products/Memory/Laptop-and-Notebook-Memory/Vengeance%C2%AE-Series-16GB-%282x8GB%29-DDR4-SODIMM-2400MHz-CL16-Memory-Kit/p/CMSX16GX4M2A2400C16#tab-overview",
            "children": [
            {
                "type": "text",
                "value": "https://www.corsair.com/us/en/Categories/Products/Memory/Laptop-and-Notebook-Memory/Vengeance%C2%AE-Series-16GB-%282x8GB%29-DDR4-SODIMM-2400MHz-CL16-Memory-Kit/p/CMSX16GX4M2A2400C16#tab-overview"
            }
            ]
        },
        {
            "type": "text",
            "value": ")"
        },
        {
            "type": "break",
            "position": {
            "start": {
                "line": 1,
                "column": 314,
                "offset": 313
            },
            "end": {
                "line": 2,
                "column": 1,
                "offset": 316
            }
            }
        },
        {
            "type": "text",
            "value": "* **Disk:** M.2 NVMe [Samsung 970 EVO 250GB \\(MZ-V7S250B/AM\\)]("
        },
        {
            "type": "link",
            "title": null,
            "url": "https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-970-evo-plus-nvme-m-2-250gb-mz-v7s250b-am/",
            "children": [
            {
                "type": "text",
                "value": "https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-970-evo-plus-nvme-m-2-250gb-mz-v7s250b-am/"
            }
            ]
        },
        {
            "type": "text",
            "value": ")"
        },
        {
            "type": "text",
            "value": " and M.2 SSD [Transcend 64GB SATA III 6 Gb/s MTS800](https://www.transcend-info.com/Embedded/Products/No-803)"
        }
        ],
        "position": {
        "start": {
            "line": 1,
            "column": 1,
            "offset": 0
        },
        "end": {
            "line": 2,
            "column": 310,
            "offset": 625
        }
        }
    }
    ],
    "position": {
    "start": {
        "line": 1,
        "column": 1,
        "offset": 0
    },
    "end": {
        "line": 3,
        "column": 1,
        "offset": 626
    }
    }
}

Expected behavior

Instead it should be parsed the same way as GitHub does, as a pure text node.

Actual behavior

Links are detected in the content. It results in a tree containing links. It also causes a bigger issue where the following is not true: input > parsed > markdown > parsed2 > markdown2 and markdown !== markdown2.

Runtime

Node v14

Package manager

yarn v2

OS

macOS

Build and bundle tools

esbuild

wooorm commented 3 years ago

This seems to be the same as some part of remarkjs/remark-gfm#16 (specifically last sentence in my comment here: https://github.com/remarkjs/remark-gfm/issues/16#issuecomment-843314610).

wooorm commented 3 years ago

It’s annoying because GitHub is buggy. See here how they render different in readmes from comments/issues/prs: https://gist.github.com/wooorm/076fd173c31ba6837f17591d5932476e#file-autolink-algo-2-character-reference-md

wooorm commented 3 years ago

The output of unified et al gives:

import {toHtml} from 'hast-util-to-html'
import {toHast} from 'mdast-util-to-hast'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {gfmAutolinkLiteral} from 'micromark-extension-gfm-autolink-literal'
import {
  gfmAutolinkLiteralFromMarkdown,
  gfmAutolinkLiteralToMarkdown
} from './index.js'

const input = `
~~~markdown
[ https://example.com
[ https://example.com
https://example.com
https://example.com
~~~

[ https://example.com
[ https://example.com
https://example.com
https://example.com
`

const actual = toHtml(
  toHast(
    fromMarkdown(input, {
      extensions: [gfmAutolinkLiteral],
      mdastExtensions: [gfmAutolinkLiteralFromMarkdown]
    })
  )
)

console.log(actual)
<pre><code class="language-markdown">[ https&#x26;#x3A;//example.com
[ https://example.com
https&#x26;#x3A;//example.com
https://example.com
</code></pre>
<p>[ <a href="https://example.com">https://example.com</a>
[ https://example.com
https://example.com
https://example.com</p>

This does not match either of the algorithms. However, I’d lean towards how GitHub renders readmes as “standard” whereas comments are buggy. So, I’d think unified et al should match readmes. This is different from your statement on proposed outcome though: “Instead it should be parsed the same way as GitHub does, as a pure text node.”

github-actions[bot] commented 3 years ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

wooorm commented 3 years ago

I’ve applied a fix to better match how GitHub renders readmes.