mdx-js / mdx

Markdown for the component era
https://mdxjs.com
MIT License
17.77k stars 1.14k forks source link

Alternative Syntax #732

Closed ChristianMurphy closed 5 years ago

ChristianMurphy commented 5 years ago

Here's another potential idea @johno: What if MDX had everything inside {{}} be JS, kind of like a reverse of JSX? It'd be a similar concept to Mustache. Here's an example:

# This is a {{<b>}}*header*{{</b>}}

This is a paragraph.

{{<Button>}}
  This is a *Markdown* button
{{</Button>}}

- This is
- a list

{{
  // This is just plain JS
  const size = 123;
  import MyCanvas from './MyCanvas';
}}

{{<MyCanvas width={size} height={size} />}}

Or here's an idea using just one, which I originally thought would clash with plain text that uses {, but it could be escaped as \{

# This is a {<b>}*header*{</b>}

This is a paragraph.

{<Button>}
  This is a *Markdown* button
{</Button>}

- This is
- a list

{
  // This is just plain JS
  const size = 123;
  import MyCanvas from './MyCanvas';
}

{<MyCanvas width={size} height={size} />}

Originally posted by @sdegutis in https://github.com/mdx-js/mdx/issues/628#issuecomment-519705483

ChristianMurphy commented 5 years ago

Thanks for the suggestions @sdegutis!

We explored curly braces a bit for handling JSX but ultimately found it to be too noisy. Pure, inlined JSX feels more elegant and is more pleasant to work with as an author. Though, of course, it does make parsing more difficult.

Currently MDX does use a surgically combined parser. It enhances remark's Markdown parser by extending its tokenization to recognize JSX syntax and import/exports. Currently, import/export blocks are then parsed/transformed with babel to do anything node massaging needed for the MDXAST.

The proposed interleaving is already partially supported, it's just a matter of officially defining the syntax and then parsing a handful of additional cases. This route is ultimately something that we need as authors of MDX for callouts, messages, and other decorations to MDX markup.

We're not far off from achieving this next level of syntax/functionality, it's mostly a matter of fully defining the syntax we want and then rolling up our sleeves.

Originally posted by @johno in https://github.com/mdx-js/mdx/issues/628#issuecomment-519735036


@johno Thanks for the response. I should probably explain a little bit about where I'm coming from.

I found MDX about a year ago because my client has a use-case that MDX in its 0.x incarnation is somewhat close to suitable for, and the evolution of syntax ideas in these issues are really close to suitable for. But as I tried to integrate MDX into our system, there were too many rough edges, so I put that whole project on hold.

Coming back to that project this month, I looked into MDX again and found these issues, as well as the initial design proposal from 18 months ago. I know that 18 months in the software world is almost an eternity but I see this thread as essentially a continuation of that brainstorming, and figured I'd hop in with some ideas. But I kind of jumped into it too quickly last night with my Mustache-like proposal, because it was around 5pm and my laptop battery was almost dead, so that was too rushed. My apologies.

One of the main problems I see with the direction MDX is being taken in these issues is the extreme ambiguity, where new clashes and edge-cases are found often. This leads to two concrete problems:

  1. The language becomes unpredictable and surprising, because as difficult as it is to express the rules of the language to the parser, it's also that difficult to express them to users, and for users to memorize the rules of the language. (This is similar to the extreme flexibility in Ruby, which increases the learning curve time a lot and makes for a lot of headaches for many people.)

  2. The parser becomes more and more difficult to implement, and implementing and trying out specific designs often reveals more and more edge-cases because the syntax isn't very orthogonal. This leads to very slow development time and very long times between releases.

That's why I'm still looking into ways where either the future of MDX, or a fork of MDX, or something similar to MDX, can become something that has a syntax that's extremely predictable, very quick and easy for developers to grok (almost as easy as Markdown itself!), and is so clear that there are (almost) no edge-cases to wait for someone to stumble upon.

Another reason is that I'm really interested in getting this one particular client project to a usable stage since it's been shelved for 12 months, and MDX in its current form is not quite suitable, and it's not exactly clear when these syntax evolutions will be feasible to implement, because as you mentioned, many design quirks are still being worked out. Whereas, for the sake of getting my client project moving forward, I could probably justify spending a week or two of full time work on either MDX or something like MDX, and I wonder if during that time, I would be able to get to something immediately usable, assuming there was a design proposal that simplified and disambiguated the language.

So that's where my design proposal above was coming from. I thought perhaps if the community were open to design/syntax simplifications, we could perhaps get the ball rolling. And since it's not clear where all the discussion has been happening, and all I found was these issues, which didn't mention anything like this one, I figured I would suggest it. The idea was based on the simple rule that "everything outside curly braces is markdown and everything inside of it is interweaved JSX." It could even simplify the syntax further by adding the clause that "if the line starts with < then it doesn't need to be wrapped in curly braces," as another idea. It's not perfect of course, and has its own ambiguities, but it was something to put out there.

That said, even if that specific proposal isn't quite right, I do think it's worth considering some major simplifications, to get MDX back to the extreme simplicity that made Markdown so successful in the first place, while having the power of dynamic JavaScript.

Incidentally, this idea of simplicity came from looking at MDX's About page and finding Idyll, which has less flexibility but far more simplicity and predictability than MDX's proposals. In fact the main thing Idyll is missing is the ability to write raw JavaScript inline in Markdown. MDX allows that only in an "export" line, which kind of feels like a strange hack when the variables are meant to be local only and aren't going to be exported. So this was another reason I started looking into an idea like "everything inside curly braces that starts at the beginning of a line is just plain JavaScript" which would allow import, export, and plain const/let statements too.

Originally posted by @sdegutis in https://github.com/mdx-js/mdx/issues/628#issuecomment-520002953

ChristianMurphy commented 5 years ago

Hi @sdegutis, that’s a great write up, thanks for taking the time to express your thoughts so thoroughly.

[...] very quick and easy for developers to grok (almost as easy as Markdown itself!)

What makes you think that Markdown is easy? I’m asking because I spent the last 5 years on Markdown parsers and I conclude that Markdown is messy. Especially the part where you embed HTML, and maybe even JS in HTML in MD. It’s one of the things that makes it great as well. And MDX plus this proposal are working on that.

I find Markdown very different from your proposal: Markdown is loose, it’s vague, it’s unpredictable.

Originally posted by @wooorm in https://github.com/mdx-js/mdx/issues/628#issuecomment-520008323


that’s a great write up, thanks for taking the time to express your thoughts so thoroughly.

My apologies for writing such a long comment, but I didn't have time to make it shorter. 😄

What makes you think that Markdown is easy? I’m asking because I spent the last 5 years on Markdown parsers and I conclude that Markdown is messy.

@wooorm It's true that Markdown has some complexity (as seen by how long the CommonMark Spec is) in order to address Gruber's original ambiguities. But I can't help but think that if MDX embraces that complexity instead of fighting against it, it's going to multiply exponentially.

Or in other words, as I'm reading through the examples in this issue and the related issues that it links to, they're starting to get so confusing that I couldn't really explain what they should do or not do, why they should work or not work, etc. And to me that seems like a sign that it'll be just as confusing for users of my client's project to work with.

And I get the idea that these issues are meant to focus more on the more confusing and ambiguous edge-cases, rather than the easy and common-sense situations. But the more I try to internalize the language rules that this new evolution of MDX represents, the harder of a time I'm having understanding it, being able to put it into words, and using it to create sample documents.

Originally posted by @sdegutis in https://github.com/mdx-js/mdx/issues/628#issuecomment-520015007


Here's another evolution idea, inspired by Idyll.

If the MDX community isn't open to discussion about new syntax ideas, or this is the wrong place for that discussion, let me know. I'm posting this here because it's the closest place I can guess for this kind of discussion, based on the issue's title. Besides, interpreted more broadly, "interleaving markdown in JSX" is kind of the heart of MDX.

The idea is based on Idyll's super restrictions of their bracket-style version of JSX, where they only allow extremely restricted code to be present inside JSX expressions. At first I was highly skeptical of such an idea, because I often put arbitrary code inside JSX attributes, such as map literals, array literals, and arrow functions. But then I realized, all of those could be extracted into variables anyway.

So the idea is to limit "raw JavaScript" to one section of the document, and allow only very specific and restricted subsets of JSX expressions within the Markdown area, which would be interpreted with the same basic semantics that CommonMark interprets HTML. And using the same concept as front-matter, all JavaScript could go in the "front-matter area" of a document, and all Markdown at the end. This would also make a clear delineation representing "execution flow" since in this variant proposal, arbitrary executable JS would be allowed in the same document as the Markdown, and not just limited to an export statement like in current MDX.

For example:

// raw JS goes here
import Foo from './Foo';
const qux = 123;
// this is the front-matter delimiter between JS and Markdown:
---
# This is the *Markdown* body

<Foo bar={qux}/>

Restricting JSX syntax to only variables or a few types of literals would make implementation of this as simple as taking an existing Markdown parser and slightly adjusting its HTML parsing, rather than dealing with a full-fledged parser plugin system and ASTs.

Restricting JS to above the fold and MD to below would make it easy to just use two different parsers and not need to surgically combine them or weave them together with plugins.

Originally posted by @sdegutis in https://github.com/mdx-js/mdx/issues/628#issuecomment-520052926


I just figured out how to use Babel's parser to reliably figure out where a JavaScript expression ends (about 10 lines of code), which means a portion of my suggestion is unnecessary now, the part about restricting what kind of JS(X) can appear inside a JSX element's attribute.

That said, I think the concept of splitting the file into ${prelude}\n\n---\n\n${markup} makes a lot more sense than special-casing lines that start with /^(import|export)\s/. This allows full use of JS, including executing code as soon as the document is loaded, rather than restricting arbitrary JS to be subordinate to an export statement.

Originally posted by @sdegutis in https://github.com/mdx-js/mdx/issues/628#issuecomment-520458566

thesoftwarephilosopher commented 5 years ago

A final, refined version of my proposal for an alternative syntax is that the top half of a document be plain JavaScript, the bottom half be plain Markdown, and the Markdown be allowed to contain JSX expressions instead of HTML tags. The JSX expressions would be allowed to contain arbitrary JavaScript, e.g. the qux inside both <Foo bar={qux}/> and <Foo bar>{qux}</foo>. The compiled output would be a combination of the JavaScript prelude left as-is and the Markdown compiled with JSX support, but otherwise also left as-is.

In short, this issue is just a proposal to change MDX's syntax so that, instead of special-casing Markdown lines that start with import and export and interpreting them as JavaScript, any arbitrary JavaScript can be written in the area typically reserved for "front-matter".

A strong benefit of this proposal is that the parsing of JSX expressions within the Markdown portion of the document strictly becomes a concern for Markdown plugins. To that end, I'm working on extending remarkable to understand JSX, and have opened jonschlinkert/remarkable#361 to discuss implementation of this with the authors/maintainers of that project and whoever else is interested in the more generic concept of JSX within Markdown.

Another benefit is that the output of the JSX within the Markdown is also pluggable. In MDX's current form, the output is largely redundant for my use-case, and I have a good deal of code that exists just to "undo" a lot of the use-case specific features that MDX automatically comes with and which my use-cases don't share. Moving both the parsing and rendering of JSX within Markdown back into the Markdown plugin space makes it possible for everyone to customize their output, either through built-in rendering plugins that MDX provides, or allowing users to create their own.

wooorm commented 5 years ago

Hi @sdegutis. Two more questions.


Lastly, my understanding of your current proposal is now: you are describing MDX to stay as-is, except that you’d prefer JS “frontmatter”, a thematic break (---), and then MDX. Could you confirm if that’s correct?

thesoftwarephilosopher commented 5 years ago

Lastly, my understanding of your current proposal is now: you are describing MDX to stay as-is, except that you’d prefer JS “frontmatter”, a thematic break (---), and then MDX. Could you confirm if that’s correct?

Right.

Have you seen that the parsing done by MDX is a markdown plugin?

Yep, I've dug pretty far into MDX and its implementation, about 12 months ago and again this month, in order to try to customize its behavior and output, which is how I encountered its ecosystem. I tried several times to imitate its behavior by writing my own remark/unified plugins, and was getting nowhere.

"JSX expressions would be allowed to contain arbitrary JavaScript " [...] This is very different from how HTML works in Markdown. And that’s why #628 is open.

Right, and this is something I also came up against. I found this set of related MDX issues because I was running into most of the same problems when I tried to use MDX. The reason I offered my suggestion of a simplification of the language was because the semantic/syntactic issues seem to be so universally confusing to everyone that it's holding up MDX implementation. And I personally would benefit from the syntax of MDX, but not necessarily its current implementation.

This is why I'm going down a different path, of trying to implement JSX as a remarkable plugin, because its plugin system is much easier for me to understand personally than remark's/unified's. And by simply exchanging their HTML parsing with Babel-backed JSX parsing, my hope is that we can then delegate most of the questions of "should this be inline or not" to the core of the system, only needing to concern ourselves with "where does the JSX begin and where does it end?" instead.

To me it seems like this is still related to MDX, even though I'm suggesting a different implementation, because ultimately it's the same syntax and semantics that we all want: arbitrary JSX within Markdown documents.

johno commented 5 years ago

In my opinion, separating Markdown and JS in the same document is clunky and diverges from the goals of MDX (simple, seamless). The focus, for this project, has always been about writing content and enabling sprinkles of components when needed. Writing. Complex app-like JSX is a can of worms and overcomplicates content.

It's important to mention, with complex JS usage, that's what the import syntax is for. If your JavaScript or JSX needs are pretty complex in an MDX document, they should really be broken out into their own modules or components and then imported. You can also separate the content aspects and import them for rendering inside JSX to achieve similar effects.

It's typically best to avoid having two ways to do one thing. And I think, ultimately, the import is more flexible because you can tuck away implementation details and optimize for end user APIs.

This also causes issues with frontmatter support. MDX doesn't support frontmatter in core, but it's commonly used by end users via plugins.

Something else worth adding is that the existing MDX syntax has 100k+ downloads a week and thousands of projects using it. Making a drastic change to the syntax would necessitate a substantial improvement in DX. For the reasons mentioned above, I'm not sure it makes sense for this project but I appreciate you sharing your thoughts here :heart:.

Deficiencies in parsing

I think it's also worthwhile to reiterate where MDX parsing is currently broken. It boils down to essentially 3 scenarios, all of which are fixable (some downstream projects like Docz have implemented parsing workarounds in userland).

Interleaved blocks with a second, indented tag

<Tomato>
  <Box>

# I'm not parsed properly

  </Box>
</Tomato>

JSX blocks with blank lines

<style>{`
  .body { margin: 0; }

  .tomato { color: tomato; }
`}</style>

Inline JSX embedded expressions

# Hello, <>{props.name}</>

Each of these don't add much complexity to the grammar and can be solved (likely soon now that this will become my focus in the near future).

1 and #2 are "features" of Markdown. #3 is templating which has a predictable grammar for 99% of use cases.

Yes, it's taken a while to address these parsing issues as you've noted, but we've been observing community usage very heavily in order to ensure we can implement these changes with little interruption for users and downstream projects.

I plan on supporting MDX for years so I want to move slowly and deliberately. As such, I've mostly focused on low-hanging fruit, documentation, and rendering implementation details (context providers, custom pragma, etc).

This is also a community/collective open-source run project that exists entirely on donations, our free time, and a handful of generous companies. Patience is always appreciated.


Right, and this is something I also came up against. I found this set of related MDX issues because I was running into most of the same problems when I tried to use MDX.

Could you clarify the issues you've run into? I mostly recall syntax suggestions. Are the issues you've encountered those already outlined in #628?

The reason I offered my suggestion of a simplification of the language was because the semantic/syntactic issues seem to be so universally confusing to everyone that it's holding up MDX implementation. And I personally would benefit from the syntax of MDX, but not necessarily its current implementation.

My apologies, but I'm a bit confused here. The majority of this syntax is currently supported. The RFC is intended to formalize a particular usage of syntax that has been undocumented while also using this as an opportunity to address the last few edge cases.

This is why I'm going down a different path, of trying to implement JSX as a remarkable plugin, because its plugin system is much easier for me to understand personally than remark's/unified's.

That's the beauty of open source :smile_cat:.

And by simply exchanging their HTML parsing with Babel-backed JSX parsing, my hope is that we can then delegate most of the questions of "should this be inline or not" to the core of the system, only needing to concern ourselves with "where does the JSX begin and where does it end?" instead.

Yeah, I mean this is essentially what the interleaving RFC is proposing.

To me it seems like this is still related to MDX, even though I'm suggesting a different implementation, because ultimately it's the same syntax and semantics that we all want: arbitrary JSX within Markdown documents.

Yep! This is why I've written a specification. I see it as MDX is a language, and our unified org (mdx-js) is an implementation.


All this said, there will be lots of overlap in our goals so I hope that we can, at the very least, share implementation when /where appropriate.

Looking forward to hearing your thoughts!

thesoftwarephilosopher commented 5 years ago

Something else worth adding is that the existing MDX syntax has 100k+ downloads a week and thousands of projects using it. Making a drastic change to the syntax would necessitate a substantial improvement

First of all congrats. But also yeah, that's an understandable reason to not want to make backwards-incompatible changes.

It's important to mention, with complex JS usage, that's what the import syntax is for. If your JavaScript or JSX needs are pretty complex in an MDX document, they should really be broken out into their own modules or components and then imported. You can also separate the content aspects and import them for rendering inside JSX to achieve similar effects.

That points out a core assumption about the MDX project that isn't true for everyone: that it'll be compiled at build-time in the context of something like webpack. The language design, API, choice of features, implementation, and your recommendation to break JS(X) out into imported modules, are all guided by this assumption.

I have a very different usage context that imports things differently, requires very differently rendered output, and would benefit much more from a more generalized JavaScript prelude feature than from the current specialized-import/export feature. The concept of shortCodes also needs to be implemented very differently in my use-case, etc.

So I keep coming back to the idea that "JSX in Markdown" is perfect for what I need, and I keep trying to use this one aspect of MDX by itself, but it's taking a lot more work to separate it out from the other features that MDX offers, which don't make as much sense in my particular context. That's the underlying motivation for this issue.

[...] This is why I've written a specification. I see it as MDX is a language, and our unified org (mdx-js) is an implementation.

I saw this spec before, it's a great idea. But it seems strange that it's tied to the remark/unified ecosystem. In my mind, considering MDX to be more a language than a library, a spec for it should just be an amendment to the CommonMark spec, which replaces the HTML rules with similar JSX rules, keeping the same implementation-agnosticism already in that spec.

It also would make MDX more flexible and generalized if the import/export feature were an optional extension of the spec, rather than a core language feature. And my prelude feature would just be omitted altogether from the spec, being something that can be parsed away before MDX parsing even begins.

Are the issues you've encountered those already outlined in #628?

Yep.

thesoftwarephilosopher commented 5 years ago

So I guess in light of my last comment just now, which is there to give context as to where I'm coming from, this issue is more about the idea of removing implementation-specific details from MDX and its spec, and moving them out into external concerns. This way MDX becomes a clean language, rather than a library or build plugin, although its current ecosystem would still be the default libraries and build plugins.

thesoftwarephilosopher commented 5 years ago

This is also a community/collective open-source run project that exists entirely on donations, our free time, and a handful of generous companies. Patience is always appreciated.

I hope I haven't come across as impatient or demanding. I've been doing open source for over a decade, and I understand and fully appreciate how much goes into it.

The only reason I brought up this point:

MDX in its current form is not quite suitable, and it's not exactly clear when these syntax evolutions will be feasible to implement, because as you mentioned, many design quirks are still being worked out

was in light of the very next run-on sentence of mine:

for the sake of getting my client project moving forward, I could probably justify spending a week or two of full time work on either MDX or something like MDX, and I wonder if during that time, I would be able to get to something immediately usable, assuming there was a design proposal that simplified and disambiguated the language.

which was in response to:

We're not far off from achieving this next level of syntax/functionality, it's mostly a matter of fully defining the syntax we want and then rolling up our sleeves.

which I incorrectly misread as "we're far off from" at first. Oops. 😄

That said, besides the implementation of the parser, there are several other features of MDX as mentioned above which are incompatible with my client project's needs, and several of my features which might not be compatible with the needs of many people already using MDX.

But it would be such a shame for us to have two separate "JSX in Markdown" parsers just because we have different needs for the things outside "JSX in Markdown".

timneutkens commented 5 years ago

Based on the comments I read you basically want:

Which sounds like you want to build a new library, rather than change MDX or even fork it. Which is totally fine, it just doesn't fit into this project / it's goals in my opinion.

For reference there was an existing MDX implementation that existed even before we wrote the MDX spec, it didn't quite suit our needs at the time so we released mdx-js/mdx to solve our particular needs.

ChristianMurphy commented 5 years ago

I'm coming from, this issue is more about the idea of removing implementation-specific details from MDX and its spec, and moving them out into external concerns. This way MDX becomes a clean language, rather than a library or build plugin, although its current ecosystem would still be the default libraries and build plugins.

import and export are language level constructs. There is nothing tying them to a specific tool, they are ES module syntax that MDX adopted as part of it's language syntax, just like it adopted JSX syntax.


@sdegutis I guess what I'm missing here is: why change the language? It sounds like we're all on the same page, there is a MDX language, and a reference runtime.

It also sounds like, for your use case, transforming import and export nodes into a text nodes, and using frontmatter to pass values to the runtime scope and components would handle the content you want to process.

All of those would be customizations/plugins to the runtime.

Is there an edge case you've run into where the language prevented content you'd expect to work from working?

johno commented 5 years ago

That points out a core assumption about the MDX project that isn't true for everyone: that it'll be compiled at build-time in the context of something like webpack. The language design, API, choice of features, implementation, and your recommendation to break JS(X) out into imported modules, are all guided by this assumption.

A lot of the existing MDX libraries do make the assumption of a bundler, but I don't think this is unreasonable considering MDX requires a transpilation step (and even outputs JSX which browsers don't understand). exports can be trivially transformed into variables and imports can be ignored for circumstances where it doesn't make sense.

I have a very different usage context that imports things differently, requires very differently rendered output, and would benefit much more from a more generalized JavaScript prelude feature than from the current specialized-import/export feature. The concept of shortCodes also needs to be implemented very differently in my use-case, etc.

Totally. Hypothetically this could be handled in userland with our existing libraries, and if it can't be, that's something we'd address because we want to be able to support more complex or bespoke usecases via plugins.

I saw this spec before, it's a great idea. But it seems strange that it's tied to the remark/unified ecosystem. In my mind, considering MDX to be more a language than a library, a spec for it should just be an amendment to the CommonMark spec, which replaces the HTML rules with similar JSX rules, keeping the same implementation-agnosticism already in that spec.

It also would make MDX more flexible and generalized if the import/export feature were an optional extension of the spec, rather than a core language feature. And my prelude feature would just be omitted altogether from the spec, being something that can be parsed away before MDX parsing even begins.

Yeah, agreed. This is definitely something that we need to change. The spec shouldn't be so implementation specific and is something I'm hoping to address at some point in the near future. The specification should outline the language and its syntax.

this issue is more about the idea of removing implementation-specific details from MDX and its spec, and moving them out into external concerns. This way MDX becomes a clean language, rather than a library or build plugin, although its current ecosystem would still be the default libraries and build plugins.

💯

I hope I haven't come across as impatient or demanding. I've been doing open source for over a decade, and I understand and fully appreciate how much goes into it.

Haven't taken it that way at all myself. Sometimes it takes a few back and forths in GitHub issues to understand all the context 😸

That said, besides the implementation of the parser, there are several other features of MDX as mentioned above which are incompatible with my client project's needs, and several of my features which might not be compatible with the needs of many people already using MDX.

Yeah, the import/export/interleaving features aren't something that MDX is going to change at this point. Though, I'm definitely interested in trying to figure out a middle ground here in userland since your project won't be the first MDX derivative. Parsing is a hard problem so being able to share implementations helps us all.

But it would be such a shame for us to have two separate "JSX in Markdown" parsers just because we have different needs for the things outside "JSX in Markdown".

No doubt, I think no matter what we arrive upon, we can hopefully at least share some implementation.

thesoftwarephilosopher commented 5 years ago

Thanks everyone for your responses. It seems that, at the moment at least, the requirements I have don't fully line up with MDX's roadmap. I think this issue can be considered closed.