wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.21/markdown/
MIT License
934 stars 52 forks source link

MdxFlowExpression doesn't seem to preserve white space in values #150

Open bnchi opened 1 week ago

bnchi commented 1 week ago

Given this test case :

from("  {`\n    a\n  `}", &ParseOptions::mdx()).unwrap()

the generated markdown tree seem to be missing the white spaces between the first \n and a

The tree outptut from the above Rust code is :

Root { children: [MdxFlowExpression { value: "`\na\n`", position: Some(1:3-3:5 (2-15)), stops: [(0, 3), (1, 4), (2, 9), (3, 10), (4, 13)] }], position: Some(1:1-3:5 (0-15)) }

I ran the same input on the JS version and the output is :

{
  type: 'root',
  children: [
    {
      type: 'mdxFlowExpression',
      value: '`\n  a\n`',
      position: [Object]
    }
  ],
  position: {
    start: { line: 1, column: 1, offset: 0 },
    end: { line: 3, column: 5, offset: 15 }
  }
}
ChristianMurphy commented 1 week ago

Good to see you again @bnchi! 👋 IIRC white space is not considered significant in a JSX context (E.G. https://eslint.style/rules/default/jsx-child-element-spacing) In other words if could be interesting to investigate if you want, but this shouldn't break any tooling as far as I'm aware.

Did this come up from doing property testing comparing the JS and rust implementations? Or is there another edge case this uncovers?

bnchi commented 1 week ago

Hey @ChristianMurphy !

This is taken from a round trip test in the mdx implementation here : https://github.com/syntax-tree/mdast-util-mdx-expression/blob/main/test.js#L535C25-L535C47

Shouldn't I expect the round trip to not lose information like this when I go md -> mdast -> md ?

wooorm commented 1 week ago

Re Christian: This is different from JSX: this is expressions.

Expressions often have backticked strings. In them, there needs to be some persisting of whitespace.

On the other hand, people do often want to indent their expressions. Just like other things in markdown:

{
  `
    what about whitespace
    for this?
  `
}

There is a more complex case where expressions are used in JSX:

<alpha
  some-long-attributes={true}
  some-more-long-attributes={true}
  some-value={`
    what about whitespace
    for this?
  `}
/>

What you are seeing is code from markdown-rs and micromark/mdast-util* being out of sync with each other. Code such as https://github.com/micromark/micromark-extension-mdx-expression/commit/103af9acc9c0477ddccc9ffdd40cd93b0e764bf6 would need to be backported. See also the issue https://github.com/mdx-js/mdx/issues/2533 and the commit https://github.com/micromark/micromark-extension-mdx-jsx/commit/67f9baf2a8d77987e7bb1c9a80c1ffccffc00a07!

bnchi commented 1 week ago

@wooorm does this need to be back-ported for the serialization feature you think ?

For now we can just serialize the trees as they come in.

wooorm commented 1 week ago

I am not entirely sure. There is handling in different places. Parsing and serializing. Regular expressions and expressions in JSX attributes. I think it’s these 4 places. Not sure which of them are out of date with the JS version, and thus which need to be backported

bnchi commented 1 week ago

Round-trip testing will probably help us to know where the issues are as I move along since we were able to detect this one this way.

I'll see if I can make a PR to close this issue before continuing to work on the serialization process.

wooorm commented 1 week ago

Ah, I think I understand better what you mean. Yes: I do think that is needed! Appreciated, thank you for working on this!

ChristianMurphy commented 1 week ago

Round-trip testing will probably help us to know where the issues are as I move along since we were able to detect this one this way.

agreed, I added some ideas in https://github.com/wooorm/markdown-rs/pull/127#issuecomment-2395573742 and this may be further supported by https://github.com/wooorm/markdown-rs/issues/149