micromark / micromark-extension-gfm-strikethrough

micromark extension to support GFM strikethrough
https://unifiedjs.com
MIT License
3 stars 5 forks source link

Not parsing “empty” strikethrough #1

Closed tripodsan closed 3 years ago

tripodsan commented 3 years ago

Subject of the issue

try to roundtrip the following mdast:

root[1]
└─0 heading[4]
    │ depth: 2
    ├─0 text "Hello, "
    ├─1 delete[1]
    │   └─0 text "world."
    ├─2 text " stray "
    └─3 delete[0]

stringify produces:

## Hello, ~~world.~~ stray ~~~~

parsing this again produces:

root[1] (1:1-2:1, 0-32)
└─0 heading[3] (1:1-1:32, 0-31)
    │ depth: 2
    ├─0 text "Hello, " (1:4-1:11, 3-10)
    ├─1 delete[1] (1:11-1:21, 10-20)
    │   └─0 text "world." (1:13-1:19, 12-18)
    └─2 text " stray ~~~~" (1:21-1:32, 20-31)

Your environment

$ npm ls --depth 0
├── mdast-builder@1.1.1
├── remark-gfm@1.0.0
├── remark-parse@9.0.0
├── remark-stringify@9.0.0
├── unified@9.2.0
└── unist-util-inspect@6.0.1

Steps to reproduce

const remark = require('remark-parse');
const stringify = require('remark-stringify');
const unified = require('unified');
const gfm = require('remark-gfm');
const inspect = require('unist-util-inspect');
const {
  root,
  text,
  heading,
  strike,
} = require('mdast-builder');

const mdast = root([
  heading(2, [
    text('Hello, '),
    strike(text('world.')),
    text(' stray '),
    strike([])
  ]),
]);

console.log('original:')
console.log(inspect(mdast));

const doc = unified()
  .use(stringify)
  .use(gfm)
  .stringify(mdast);

console.log('markdown:')
console.log(doc);

const mdast2 = unified()
  .use(remark)
  .use(gfm)
  .parse(doc);

console.log('roundtripped:')
console.log(inspect(mdast2));

Expected behavior

the empty strikethrough node should not be lost when reparsing.

Actual behavior

the empty strikethrough node is converted to text.

Workaround

since the empty strikethrough nodes have no semantic meaning, they could be removed from the mdast prior to serialization, or suppressed by gfm-to-markdown automatically.

wooorm commented 3 years ago

This is related to how to serialize mdast, which is done by mdast-util-gfm-strikethrough, because:

or suppressed by gfm-to-markdown automatically.

How we parse matches github:

Hello, world. stray ~~~~

the empty strikethrough node should not be lost when reparsing.

How could we not loose it? As an empty strikethrough (or emphasis, strong, inline code, paragraph) can’t be made with markdown, we can only change something when roundtripping:

There are several other cases where we can’t serialize trees to make sense: e.g., it’s documented here, a similar case is this one. Or, what to do with a heading in a heading? 🤷‍♂️


Could you sort this out on your side? How are you getting an empty strikethrough node? Maybe it’s a bug there. Or: if for you empty emphasis/strong/delete can be removed, remove them from the tree yourself?

tripodsan commented 3 years ago

I see the problem - I just stating the fact. you could generate an empty node. if someone really want to have a ~~~~ he can escape it with \~~~~. but nevermind :-)

Could you sort this out on your side? How are you getting an empty strikethrough node? Maybe it’s a bug there. Or: if for you empty emphasis/strong/delete can be removed, remove them from the tree yourself?

yes, of course. we are transforming a google-doc to mdast... the google-doc has an empty node with a strikethrough style. but we can remove all empty format nodes.

wooorm commented 3 years ago

Oh good to hear that you’re in control of the transform to mdast. In that case, it’s more of a bug there, than how we’re handling that mdast. 👍

tripodsan commented 3 years ago

it's actually more tricky :-)

│   └─4 delete[1]
│       └─0 text ""
wooorm commented 3 years ago

It sounds like you‘re using gdocs2md first, but as that makes serialized markdown, then how are you getting an empty text node? 🤔

tripodsan commented 3 years ago

It sounds like you‘re using gdocs2md first, but as that makes serialized markdown, then how are you getting an empty text node? 🤔

no, we use our own gdocs2mdast library...

wooorm commented 3 years ago

Ah. interesting. Is that open source somewhere?

tripodsan commented 3 years ago

Ah. interesting. Is that open source somewhere?

not (yet).