micromark / micromark-extension-mdx-jsx

micromark extension to support MDX or MDX.js JSX
https://unifiedjs.com
MIT License
9 stars 4 forks source link

Parsing strangeness around flow-level JSX elements #12

Closed iczero closed 10 months ago

iczero commented 10 months ago

Initial checklist

Problem

The current flow-level JSX tokenizer is only used for single elements. It cannot handle text. This results in some strangeness when tokenizing which causes flow-level elements to be parsed as inline (text) elements instead.

This paragraph wrapping behavior causes https://github.com/mdx-js/mdx/issues/1526, which is currently fixed by an AST transform.

Additionally, some users may wish to restore CommonMark behavior for flow-level markdown within tags (https://github.com/mdx-js/mdx/issues/1798). Instead of starting a new paragraph immediately after the tag, the PR requires a blank line to resume flow-level markdown. An opt-in configuration option is used to enable this behavior.

Solution

Please see PR #11. It introduces a new parsing scheme behind a configuration option which allows opting in to new behavior.

Alternatives

Do nothing. The current behavior has existed for a few years and is fine.

ChristianMurphy commented 10 months ago

Welcome @iczero! 👋

This paragraph wrapping behavior causes https://github.com/mdx-js/mdx/issues/1526, which is currently fixed by an AST transform.

Happy to discuss how to make the implementation more efficient.

Additionally, some users may wish to restore CommonMark behavior for flow-level markdown within tags (https://github.com/mdx-js/mdx/issues/1798). Instead of starting a new paragraph immediately after the tag, the PR requires a blank line to resume flow-level markdown.

To be clear on the goal of MDX, it is Markdown + JSX, there is no HTML in MDX. I am against changing the behavior to making it more HTML like, in ways that diverge from JSX. Paragraph handling was closer to a mix of HTML and JSX in version 1, it caused endless confusion there are many edge cases. https://github.com/mdx-js/mdx/issues?q=is%3Aissue+paragraph+is%3Aclosed I am aware there are arguments both ways, the consistency of the current API is better than the litany of edge cases that mixed HTML/JSX causes.

An opt-in configuration option is used to enable this behavior.

I am against making syntax configurable in core. MDX should offer a single and consistent language, which includes a single and consistent way of handling tags and paragraphs. Any variations and customizations can exist in plugins. https://unifiedjs.com/learn/guide/ https://github.com/micromark/micromark#creating-a-micromark-extension

iczero commented 10 months ago

@ChristianMurphy,

I have closed the PR for now because it uses a terrible approach that doesn't work.

I believe it will be more productive to discuss further when I have a prototype that actually works. I will try to make it an extension.

Thank you for your time.

wooorm commented 10 months ago

Additionally, some users may wish to restore CommonMark behavior for flow-level markdown within tags (https://github.com/mdx-js/mdx/issues/1798). Instead of starting a new paragraph immediately after the tag, the PR requires a blank line to resume flow-level markdown. An opt-in configuration option is used to enable this behavior.

They should not. That is not MDX. It is not a good idea. Your custom versions would not work with other tools that use MDX. Such as site engines. Or it would not have proper syntax highlighting in folks’ editors or on GitHub.

As Christian mentions, that was used in MDX 1 and it was the most requested feature to change. The behavior you want to model is also an often discussed problem in markdown itself: many folks don’t understand the rules and post questions on our markdown related projects.

This paragraph wrapping behavior causes https://github.com/mdx-js/mdx/issues/1526, which is currently fixed by an AST transform.

It is not about p. It’s about which markdown is allowed in other things. <>> block quote?<>, <>* list?<>, <># heading?<>, <> indented code?<>, <>```js\bfenced?.code()<>, <>*emphasis*?<>, etc.

The solution, whether it’s with how XML-like things are treated in markdown or the complete JSX grammar of MDX, is to teach users.

In MDX, it’s as follows: Put text and tags on the same line if you want text things, aka emphasis, strong, links, spans of code, that kinda stuff. Put them on separate lines if you want to include “blocks”, aka block quotes, lists, headings, code blocks, etc.

https://mdxjs.com/docs/what-is-mdx/#interleaving

github-actions[bot] commented 10 months ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

github-actions[bot] commented 10 months ago

Hi! Thanks for reaching out! Because we treat issues as our backlog, we close issues that are questions since they don’t represent a task to be completed.

See our support docs for how and where to ask questions.

Thanks, — bb