mdx-js / mdx

Markdown for the component era
https://mdxjs.com
MIT License
17.79k stars 1.14k forks source link

A `|` in a code in a table breaks the table #1285

Closed Nkzn closed 3 years ago

Nkzn commented 4 years ago

Subject of the issue

SSIA

Your environment

Steps to reproduce

Sample: https://codesandbox.io/s/brave-goldberg-qe2uv?file=/src/Content.mdx

Read the markdown below with mdx.

| title1 | title2               | title3                         |
|--------|----------------------|--------------------------------|
| foo    | `string | undefined` | this message should be visible |

Expected behaviour

The string | undefined is recognized as a code.

スクリーンショット 2020-10-01 23 41 20

(This capture is Visual Studio Code behavior)

Actual behaviour

The | in the code (`) is recognized as a table separator.

スクリーンショット 2020-10-01 23 43 37

https://codesandbox.io/s/brave-goldberg-qe2uv?file=/src/Content.mdx

Nkzn commented 4 years ago

Appendix: GitHub Behavior

title1 title2 title3
foo string | undefined this message should be visible

oh...

wooorm commented 4 years ago

GitHub doesn’t allow a pipe in code (text) in a table cell. You can escape it here on GitHub (note: escaping stuff in code normally doesn’t work, but GFM made an exception for pipes).

So, it seems we’re following GFM here. Could you check if escaping works?

Nkzn commented 4 years ago

@wooorm thanks for advising!

GFM Behavior

title1 title2 title3
foo string \| undefined this message should be visible

MDX Behavior

https://codesandbox.io/s/sad-meadow-5iwcw?file=/src/Content.mdx

スクリーンショット 2020-10-02 0 07 26

escaping works, however \ is remaining.

wooorm commented 4 years ago

Hmm. That’s annoying. Well, that’s indeed a bug!

lex111 commented 4 years ago

Something similar was fixed in GFM plugin for Remark 3, how can this fix be "backported" to MDX? Any clue? https://github.com/syntax-tree/mdast-util-gfm-table/commit/9e0e488252e5d02dc49e3475e62fde1f2cb22808

wooorm commented 4 years ago

That is not trivial, because MDX is made on the previous parser in remark, and this is for the new parser. Switching parsers is a serious breaking change.

lex111 commented 4 years ago

There is a quick solution for this, as I understand it, it can be fixed in the mdxAstToMdxHast function, something like this:

      inlineCode(h, node) {
        return Object.assign({}, node, {
          type: 'element',
          tagName: 'inlineCode',
          properties: {},
          children: [
            {
              type: 'text',
-              value: node.value
+              value: node.value.replace(/\\(?=\|)/, '')
            }
          ]
        })
      },

You can test this regular expression on Regex 101 https://regex101.com/r/M8izhu/1

How acceptable/properly is this fix?

wooorm commented 4 years ago

Btw, if I remember correctly, this was already solved in earlier remark. So if this is occurs here, it’s a bug here.

Your regex could be simplified by using \\| -> |.

It might be the fix we’re looking for, but it needs tests to confirm whether it is

lex111 commented 4 years ago

I am tested on MDX Playground and this bug occurs there:

image

Your regex could be simplified by using \| -> |.

What do you mean, could you please send link to regex101?

It might be the fix we’re looking for, but it needs tests to confirm whether it is

This is what worries me the most, because it is not clear what to test (the current use case or also other different options for using inline code in table cell eg).

wooorm commented 4 years ago

I meant:

-              value: node.value
+              value: node.value.replace(/\\\|/, '|')
            }

It’s a bit complex to explain, but this also doesn’t work, because your expression will match \\|, but it shouldn’t. See: https://github.com/micromark/micromark-extension-gfm-table/blob/a16f719d781d24555c94b57bae65327d4c9279c6/html.js#L129

This is what worries me the most, because it is not clear what to test (the current use case or also other different options for using inline code in table cell eg).

This thread has a couple of examples, so at least those. Some more examples can be found here: https://github.com/micromark/micromark-extension-gfm-table/blame/a16f719d781d24555c94b57bae65327d4c9279c6/test/input.md#L169

lex111 commented 4 years ago

Therefore, I used positive lookahead to avoid such issues (\\| should be \|, not / in case /\\\|/). It isn't wrong, is it?

wooorm commented 4 years ago

Your regex still matches the literal markdown \\|, encoded in JavaScript as '\\\\|', because it doesn’t match the first slash, and does math the second slash plus pipe. Whereas according to GitHub, the slash slash matches, and the pipe then exits the cell.

I linked a couple fixtures in my previous comment. If we’re going to try and solve this, it would be good to test several edge cases!

wooorm commented 3 years ago

Hi all! I’m going to close this as it landed on the main, the (now default) branch we’re using to gear up to the MDX@2 release.

The reason is so that it’s more clear which issues still need to be solved already and don’t require further work.

Expect another next release soonish, which might include it, or maybe it’s already released as a beta!

For more info, see https://github.com/mdx-js/mdx/issues/1041.