Open andymatuschak opened 11 months ago
I spent some time digging into this and trying to produce a fix. As far as I can tell, what's going on is:
chunkText
token surrounding their interior.micromark:subtokenize
eventually transforms chunkText
token into a data
token which still contains the trailing whitespace.micromark:resolveAllLineSuffixes
, which splits the trailing whitespace out of the data
token into a lineSuffix
token.mdast-util-from-markdown
compiles the contents of the data
token into the text
node's value, but ignores the lineSuffix
contents.Assuming you consider this behavior to be a bug, I'm not sure whether you'd consider the defect to be in the micromark
behavior (this nested content is not, in fact, at the end of a line) or in the compiler behavior (perhaps the lineSuffix
should be emitted in this nested context?).
PS: Thank you for all your hard work!
Thanks for the investigation and your kind words!
resolveAllLineSuffixes
exists because trailing spaces on a line are not “emitted”/“rendered” a
-> <p>a</p>
. This seems like a bug there as in :x[y ]z
, the space after y
should not be a “line suffix” but the one after z
should be.
I’m not 100% this is a bug. Given a
-> <p>a</p>
, # b
-> <h1>b</h1>
, # c #
-> <h1>c</h1>
. Why would this here be different?
Why do you want trailing whitespace?
Thank you for digging in! It’s a fair question, and I don’t think it’s totally obvious that trailing whitespace should be preserved.
I’m hoping to use the inline text directive to produce behavior somewhat like other inline text directives—links, inline code, emphasis. Of those, the first two preserve trailing whitespace; the last does not, AFAICT to avoid spurious parse situations. In my mind, these directive labels are very congruent to link labels. Perhaps leaf and container directive labels are less obviously analogous to link labels than the text directive labels.
More concretely, I’m experimenting with text directives to create a “redact” markup for a flashcard system, and I imagined an interface where one can drag the handles of the redaction left to right across the text, character by character. It’s freeing to be able to drop the handle wherever, and feels weird if it “jumps” when I release it because the underlying representation can’t place the right edge in certain positions. I can make this work without actual syntactic support for the trailing space scenario, but it felt unintentional (given the line suffix token when it’s not actually at the end of a line) so I thought I’d write a bug.
Thanks for considering!
Links and emphasis are the same.
*b
c*
[b
c](#)
https://spec.commonmark.org/dingus/?text=%20*b%20%0Ac*%20%0A%0A%20%5Bb%20%0Ac%5D(%23)%20%0A
The thing with them though, is that they are parsed as separate things: *
, *
, [
, ]()
. Everything goes from left to right.
So it’s the paragraph/heading parent, the content type (text), that deals with the trailing whitespace in the entire thing.
With content in the [
and ]
of directives, it’s parsed separately. It’s as if it was its own paragraph or heading. Because it could be! That’s how directives work (also the leaf / container). You currently choose to use the content inside a paragraph (which I get). But it could be say a separate tooltip. It could be nice for folks to be able to pad with whitespace
But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.
I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a
.
And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ]
.
So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.
Note: you can use a character reference btw: :x[y ]z
But there are two things here: a) initial/trailing when looking at the whole, b) initial/trailing when looking at a line ending.
I assume you don’t see a reason for “keeping” the initial/trailing whitespace for :x[y \n z]a.
Ah, great point. No, in a document like this, you're right that I would expect the leading/trailing line whitespace to be stripped:
:x[y
z]a
And that it doesn’t matter for leaf/container, as in, ::x[ Yyy zzz. ].
Right. I agree with your argument that stripping whitespace here matches the behavior of other flow-level nodes.
So if this would be implemented, it should be a) only for text directives, b) not affect whitespace around line endings.
Right. I guess I'd expect the lineSuffix
resolution behavior when the whitespace in question is in fact a line suffix. (And likewise for prefixes)
Note: you can use a character reference btw: :x[y ]z
Thanks!
lineSuffix
Without a final end-of-file end-of-line, it’s still the end of the line (a
vs a \n
). As this whole thing is parsed separately, it’s the start of the thing and end of the thing, even through there’s no \n
. But these are internals, the terms don’t mean much.
I remain unsure whichever is better. Current state or proposed state. I can see arguments for both.
Fair enough! :) Thanks for your consideration.
One more thing I wanted to mention. You mention "Links and emphasis are the same."—and in your example, they are. I'm sure you know this, but I wanted to clarify that I was referring to the behavior of links when their label doesn't involve a line ending; i.e. [a ](#)
does parse to a link
containing a text
node with value a
. It's in this sense that I was hoping :redact[a ]
would behave. Likewise for `a `
. Whereas *a *
doesn't parse to an emphasis at all, because of the flanking rules.
Initial checklist
Affected packages and versions
micromark-extension-directive@3.0.0 micromark@4.0.0 mdast-util-from-markdown@2.0.0
Link to runnable example
https://astexplorer.net/#/gist/b3ff9dc85d8e49ef94791c73e645646f/21c9658bf3272b1922fb07d889351d8482c1c552
Steps to reproduce
In this AST Explorer demo, observe the parse tree for
:redact[secret ]word
.(The same issue reproduces on my local machine using node@20, bun, macOS 14.1.2, and no build tools).
Expected behavior
I expect the
textDirective
node's childtext
node to have valuesecret
(with the trailing space).Actual behavior
The
textDirective
node's childtext
node has valuesecret
(without the trailing space). The trailing space isn't represented anywhere else in the parse tree.Runtime
Other (please specify in steps to reproduce)
Package manager
Other (please specify in steps to reproduce)
OS
Other (please specify in steps to reproduce)
Build and bundle tools
Other (please specify in steps to reproduce)