micromark / micromark-extension-directive

micromark extension to support generic directives (`:cite[smith04]`)
https://unifiedjs.com
MIT License
31 stars 16 forks source link

Trailing whitespace in labels is elided #22

Open andymatuschak opened 11 months ago

andymatuschak commented 11 months ago

Initial checklist

Affected packages and versions

micromark-extension-directive@3.0.0 micromark@4.0.0 mdast-util-from-markdown@2.0.0

Link to runnable example

https://astexplorer.net/#/gist/b3ff9dc85d8e49ef94791c73e645646f/21c9658bf3272b1922fb07d889351d8482c1c552

Steps to reproduce

In this AST Explorer demo, observe the parse tree for :redact[secret ]word.

(The same issue reproduces on my local machine using node@20, bun, macOS 14.1.2, and no build tools).

Expected behavior

I expect the textDirective node's child text node to have value secret (with the trailing space).

Actual behavior

The textDirective node's child text node has value secret (without the trailing space). The trailing space isn't represented anywhere else in the parse tree.

Runtime

Other (please specify in steps to reproduce)

Package manager

Other (please specify in steps to reproduce)

OS

Other (please specify in steps to reproduce)

Build and bundle tools

Other (please specify in steps to reproduce)

andymatuschak commented 11 months ago

I spent some time digging into this and trying to produce a fix. As far as I can tell, what's going on is:

Assuming you consider this behavior to be a bug, I'm not sure whether you'd consider the defect to be in the micromark behavior (this nested content is not, in fact, at the end of a line) or in the compiler behavior (perhaps the lineSuffix should be emitted in this nested context?).

PS: Thank you for all your hard work!

wooorm commented 11 months ago

Thanks for the investigation and your kind words!

resolveAllLineSuffixes exists because trailing spaces on a line are not “emitted”/“rendered” a -> <p>a</p>. This seems like a bug there as in :x[y ]z, the space after y should not be a “line suffix” but the one after z should be.

wooorm commented 11 months ago

I’m not 100% this is a bug. Given a -> <p>a</p>, # b -> <h1>b</h1>, # c # -> <h1>c</h1>. Why would this here be different?

Why do you want trailing whitespace?

andymatuschak commented 11 months ago

Thank you for digging in! It’s a fair question, and I don’t think it’s totally obvious that trailing whitespace should be preserved.

I’m hoping to use the inline text directive to produce behavior somewhat like other inline text directives—links, inline code, emphasis. Of those, the first two preserve trailing whitespace; the last does not, AFAICT to avoid spurious parse situations. In my mind, these directive labels are very congruent to link labels. Perhaps leaf and container directive labels are less obviously analogous to link labels than the text directive labels.

More concretely, I’m experimenting with text directives to create a “redact” markup for a flashcard system, and I imagined an interface where one can drag the handles of the redaction left to right across the text, character by character. It’s freeing to be able to drop the handle wherever, and feels weird if it “jumps” when I release it because the underlying representation can’t place the right edge in certain positions. I can make this work without actual syntactic support for the trailing space scenario, but it felt unintentional (given the line suffix token when it’s not actually at the end of a line) so I thought I’d write a bug.

Thanks for considering!

wooorm commented 11 months ago

Links and emphasis are the same.

 *b 
c* 

 [b 
c](#) 

https://spec.commonmark.org/dingus/?text=%20*b%20%0Ac*%20%0A%0A%20%5Bb%20%0Ac%5D(%23)%20%0A

The thing with them though, is that they are parsed as separate things: *, *, [, ](). Everything goes from left to right. So it’s the paragraph/heading parent, the content type (text), that deals with the trailing whitespace in the entire thing.

With content in the [ and ] of directives, it’s parsed separately. It’s as if it was its own paragraph or heading. Because it could be! That’s how directives work (also the leaf / container). You currently choose to use the content inside a paragraph (which I get). But it could be say a separate tooltip. It could be nice for folks to be able to pad with whitespace

But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.

I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a. And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].

So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.

Note: you can use a character reference btw: :x[y&#32;]z

andymatuschak commented 11 months ago

But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.

I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a.

Ah, great point. No, in a document like this, you're right that I would expect the leading/trailing line whitespace to be stripped:

:x[y 
 z]a

And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].

Right. I agree with your argument that stripping whitespace here matches the behavior of other flow-level nodes.

So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.

Right. I guess I'd expect the lineSuffix resolution behavior when the whitespace in question is in fact a line suffix. (And likewise for prefixes)

Note: you can use a character reference btw: :x[y ]z

Thanks!

wooorm commented 10 months ago

lineSuffix

Without a final end-of-file end-of-line, it’s still the end of the line (a vs a \n). As this whole thing is parsed separately, it’s the start of the thing and end of the thing, even through there’s no \n. But these are internals, the terms don’t mean much.


I remain unsure whichever is better. Current state or proposed state. I can see arguments for both.

andymatuschak commented 10 months ago

Fair enough! :) Thanks for your consideration.

One more thing I wanted to mention. You mention "Links and emphasis are the same."—and in your example, they are. I'm sure you know this, but I wanted to clarify that I was referring to the behavior of links when their label doesn't involve a line ending; i.e. [a ](#) does parse to a link containing a text node with value a. It's in this sense that I was hoping :redact[a ] would behave. Likewise for `a `. Whereas *a * doesn't parse to an emphasis at all, because of the flanking rules.