Partial mode - Githubissues

NickHeiner commented 1 year ago

Initial checklist

[X] I read the support docs
[X] I read the contributing guide
[X] I agree to follow the code of conduct
[X] I searched issues and couldn’t find anything (or linked relevant results below)

Problem

I'm working with large language models, which are relatively slow, so streaming/incremental output is important to a good user experience. (The project I'm working in is https://github.com/fixie-ai/ai-jsx.)

My LLM calls stream MDX. I'd like to be able to show the partially-rendered MDX to the user. However, if the MDX is incomplete, this parser crashes via the crash function (e.g. https://github.com/micromark/micromark-extension-mdx-jsx/blob/main/dev/lib/factory-tag.js#L311).

Solution

Ideally, there would be a "partial" mode (similar to untruncate-json) which auto-closes all open AST nodes via best-guess.

{/* input */}
<A b="a

{/* output */}
<A b="a" />

{/* input */}
<A b="a">foo

{/* output */}
<A b="a">foo</A>

{/* input */}
<

{/* output */}
<

Obviously, this might produce semantically strange results; it would be up to the caller to opt-in and handle these gracefully.

Alternatives

I considered sidestepping the compiler entirely and implementing my own function that maintains a stack, crawls the MDX as a string, and when it reaches the conclusion, tries to close everything on the stack.

My suspicion is that this would be simpler to implement for the easy cases, but using a modified version of the compiler would be more robust for the long tail.

remcohaszing commented 1 year ago

This would also be useful for MDX IntelliSense, as content tends to be invalid while typing.

wooorm commented 1 year ago

The first and third have little to do with a stack, how would you want to solve that?

This micromark extension I think doesn't use a stack, the stack is handled in the corresponding mdast utility

NickHeiner commented 1 year ago

Can you tell me more what you mean by the first not being related to stacks?

Agreed for the third – that case was just clarifying that when the input was already valid, the output is unchanged.

ChristianMurphy commented 1 year ago

Adding another perspective in.

I do think that some form of partial render could have value for authoring experience, human based or machine based. Specifically, being able to show a valid document up-to the point where it becomes invalid, and having some for of invalid placeholder for the remainder. I also think speculative completion/healing could be beneficial in the authoring experience like mdx-analyzer.

That said, I don't think speculative healing is a good idea for the parser. MDX is Markdown + JSX. JSX like XML and JS, does not speculatively heal or complete invalid documents.

Giving a hypothetical example of how this distinction could be a problem (grounded in past experience maintaining MDX and remark). A community member and developer uses the autohealing partials option in their authoring experience. A document author is able to see an invalid document rendering how they expect, so they leave the document invalid and save/commit it into their content management system. The developer sees the document failing on their live site, and enables the option in production so the documents render.

Problem 1: We now have an option only designed for development, enabled in production

Down the road, the MDX maintainer team makes changes to improve the autohealing, and the authoring experience. The community member and developer upgrades to the latest MDX version, but then finds documents now render different, and files an issue with MDX believing this is a bug.

Problem 2: We now have an unstable and unofficial API, being relied on as a SemVer stable API. Problem 2.1: Any change to the auto healing will either prevent adopters from upgrading, force them change their documents, or need to be put behind a flag. None of those are good options from a maintainer perspective.

I'm against adding auto healing to the core parser. Partial rendering, by stopping at the point where the document is no longer valid, feels like a better option. An external more forgiving parser could be created as a parser plugin on top of that, but should not, in my opinion, live as part of the core MDX parser.

NickHeiner commented 1 year ago

Partial rendering, by stopping at the point where the document is no longer valid, feels like a better option.

That would work perfectly well for my usecase. (Preferably, there would be some signal from the compiler API to let the caller know that the render was partial.)

wooorm commented 1 year ago

Can you tell me more what you mean by the first not being related to stacks?

The tag is still open. The tag needs to be closed before it can be pushed to a stack or popped off a stack.

– that case was just clarifying that when the input was already valid, the output is unchanged.

To me, the 3rd is already some form of “autohealing” / “partial mode”. See https://github.com/micromark/micromark-extension-mdx-jsx/issues/7.

I'm against adding auto healing to the core parser. Partial rendering, by stopping at the point where the document is no longer valid, feels like a better option. An external more forgiving parser could be created as a parser plugin on top of that, but > should not, in my opinion, live as part of the core MDX parser.

Pausing could live here, right? Or where do you see it?

Currently, in JS either a successful result is returned or an error is thrown. In Rust, it’s a Result<str, ErrorMessage>.

I guess the solution here is to expose the up to the error result, on errors? (perhaps optionally?). This gets complex, errors might occur while making that partial result. Particularly around MDX, which knows errors when parsing (here, as a micromark extension) and when dealing with the stack (as an mdast-util-from-markdown extension, because <a> </b> parses fine but isn’t, and same for <a>)

NickHeiner commented 1 year ago

Can you tell me more what you mean by the first not being related to stacks?

The tag is still open. The tag needs to be closed before it can be pushed to a stack or popped off a stack.

Yep, I think we're saying the same thing. I just meant that, if the parser sees something like <A><B><C EOF, the reason it knows the EOF is wrong is because somewhere, it's likely maintained a stack representation of A -> B -> C.

wooorm commented 1 year ago

The error is earlier, when parsing, before stacks: <a crashes because EOF is not allowed there. The stack comes after the parsing (which emits events). Perhaps pedantic and similar enough!

ChristianMurphy commented 1 year ago

Pausing could live here, right? Or where do you see it?

Pausing could live here

I guess the solution here is to expose the up to the error result, on errors? (perhaps optionally?).

That could make sense

Particularly around MDX, which knows errors when parsing (here, as a micromark extension) and when dealing with the stack (as an mdast-util-from-markdown extension,

Right, there can still be issues downstream, even if mdast-util-from-markdown parses as expected, any <CustomElement /> could itself fail.

wooorm commented 1 year ago

Something like this isn’t really a high priority for me to work on currently. Could be years. But I can always advise if someone else finds this important and wants to work on this!

micromark / micromark-extension-mdx-jsx

Partial mode #10

Initial checklist

Problem

Solution

Alternatives