wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.21/markdown/
MIT License
906 stars 50 forks source link

Clarification about MdxJsxFlow/TextElement (discrepancy between MDX Playground?) #116

Closed begleynk closed 4 months ago

begleynk commented 4 months ago

Just wanted to clarify something regarding how block/inline HTML (JSX) elements are expected to work, since I seem to have found a discrepancy between the MDX Playground and this crate.

If you run this program:

fn main() {
    let content = "<p>Foo</p>\n";

    let mut opts = markdown::Options::default();
    opts.parse = markdown::ParseOptions::mdx();

    let mdast = markdown::to_mdast(&content, &opts.parse).unwrap();
    println!("{:#?}", mdast);
}

You get the following output:

Root {
    children: [
        Paragraph {
            children: [
                MdxJsxTextElement {
                    children: [
                        Text {
                            value: "Foo",
                            position: Some(
                                1:4-1:7 (3-6),
                            ),
                        },
                    ],
                    position: Some(
                        1:1-1:11 (0-10),
                    ),
                    name: Some(
                        "p",
                    ),
                    attributes: [],
                },
            ],
            position: Some(
                1:1-1:11 (0-10),
            ),
        },
    ],
    position: Some(
        1:1-2:1 (0-11),
    ),
}

As far as I've understood - this makes sense. Any MDX element on one line is considered an inline element, so it gets wrapped in a p tag.

However, the MDX Playground (https://mdxjs.com/playground/) seems to behave differently. The input <p>Foo</p> gives the following AST (note the Flow element, and no wrapping paragraph).

{
  "type": "root",
  "children": [
    {
      "type": "mdxJsxFlowElement",
      "name": "p",
      "attributes": [],
      "children": [
        {
          "type": "text",
          "value": "Foo"
        }
      ],
      "data": {
        "_mdxExplicitJsx": true
      }
    }
  ]
}

I'm trying to understand why there's a difference here. Is the MDX Playground doing some kind of extra processing, or am I simply misunderstanding the spec, or perhaps missing a config parameter somewhere?

Thanks again for your work on this crate! ❤️

wooorm commented 4 months ago

Heya!

Any MDX element on one line is considered an inline element, so it gets wrapped in a p tag.

The rule is text and tag on the same line == text (inline). Just a tag on a line is flow (block).


The difference is just that they do different things. The playground shows the entire MDX -> JS process. to_mdast here does MDX -> mdast. The playground shows more steps. Those steps are implemented in Rust in mdxjs-rs.

begleynk commented 4 months ago

Ok got it! Thanks for such a quick response!

So there is some extra post processing that is happening beyond the initial mdast generation.

I'll take a look at mdxjs-rs. Any pointers on what exactly it's doing? Something like detecting native HTML elements and treating them differently?

begleynk commented 4 months ago

All good - I'm pretty sure I found the relevant code: https://github.com/wooorm/mdxjs-rs/blob/b1971be2dbdd2886ca4d5e9d97d9a3477cb29904/src/mdast_util_to_hast.rs#L925

Thanks for pointing me in the right direction! I'll close this issue.