Closed Hamms closed 5 years ago
@wooorm could I get your thoughts on the relative merits of the fix proposed here vs the possible fix for the same problem at the layer of the parser?
@Hamms Interesting! I’m just now rewriting the mdast spec, and am finding that tight / loose on lists and on items are pretty messy. They’re idea is that they’re useful for HTML, mostly, but as we’re not really using it here, that’s just weird!
I saw that commonmark does not render the <p>
in this case, so I guess that should be removed.
I think that the loose / tight problem should be fixed in remark-parse.
I feel that it makes sense to make the logic here simpler, so that we could only check the loose
property here
These are just half thoughts, what do based on my ideas?
Yeah, I wasn't sure how to feel about whether the problem should be fixed at the level of "paragraph" nodes or "p" tags; seems like it comes down to the question of whether "paragraph" nodes have some use beyond just being blindly converted into "p" tags.
Fixing the problem here in the converter is saying that yes they do, and the handler (in this case, mdast-util-to-hast) should also be smart enough to know what tight lists are and know when they should and should not generate "p" tags.
Fixing it in remark-parse would be saying that no they do not; they just exist to be turned into "p" tags and so the parser should be smart enough to know when to and to know inject them into the tree.
I actually originally fixed this at the parser the layer, mostly because I didn't feel that I had a good sense for the role of the paragraph node in the greater remark ecosystem, and because the existing implementation puts that responsibility on mdast-util-to-hast and not on remark-parse, so I just went with the current direction. But if you're down with the idea of me changing things on a deeper level, I do I think prefer the "stupid" paragraph nodes philosophy.
Upon further investigation, it looks like this solution actually introduces a problem with loose lists, because in the case of
- item one
- item two
remark-parse actually does not recognize the second item as being loose:
root[1] (1:1-3:11, 0-22)
└─ list[2] (1:1-3:11, 0-22) [ordered=false][loose=true]
├─ listItem[1] (1:1-2:1, 0-11) [loose=true]
│ └─ paragraph[1] (1:3-1:11, 2-10)
│ └─ text: "item one" (1:3-1:11, 2-10)
└─ listItem[1] (3:1-3:11, 12-22) [loose=false]
└─ paragraph[1] (3:3-3:11, 14-22)
└─ text: "item two" (3:3-3:11, 14-22)
Given that any list item being loose should make the whole list behave as loose - which is a "going backwards" approach that seems pretty inconsistent with remark-parse's entire premise - it seems like this fix will require changes to both the parse and transform stages.
First, parse should be updated to consistently create "paragraph" tags while still bubbling the loose
flag up as it currently does. Then, transform should be updated to purge paragraph tags in loose lists as it currently does, but to also do so for list items that contain more than one child.
@Hamms Maybe, but it’s based on commonmark! I dunno, I’m not sure of a better approach?
I think it's possible to keep the approach consistent, it just requires embracing the idea of paragraph nodes as being something more semantically significant than just placeholders for p tags
I agree with that idea, it’s what we’re doing now, and it works for me.
It’s similar to the idea of a paragraph in HTML (not just a <p>
tag either), in most cases <p>
can be omitted.
@wooorm Please take another look!
I believe I have the ultimate functionality where we want it to be; all the various ways of combining looseness, tightness, and nesting work as I expect them to. However, I'm not sure how I feel about the actual approach; rendering the results then modifying them like I'm doing feels inconsistent with the way subtrees are handled everywhere else in the codebase, but I'm not sure of a better way to do it. I'd love to hear your thoughts!
Thanks!
@wooorm Could you take a look at this when you get a chance?
Hey, sorry for the slow response! I thought about it some more, see https://github.com/remarkjs/remark/pull/364 for my current thinking. I think we can use the code here as a base on those changes in remark quite easily.
Very exciting! I'll take a look as soon as I can
Context
In markdown, "tight" lists are supposed to render text without paragraph tags:
becomes:
In remark, this usually works, but it works in kind of a weird way. The original text is actually parsed to an MDAST tree that contains paragraph nodes:
And those tags are then removed by
mdast-util-to-hast
:https://github.com/syntax-tree/mdast-util-to-hast/blob/1f8591e0b6163c437999e842774f66c9027cfc47/lib/handlers/list-item.js#L18-L24
The Problem
This usually results in correct HTML output, even though it's a bit weird to have "paragraph" nodes that don't then become "p" tags. However! This only works if the tight list item has a single child, which means that in cases like
we would expect to render to
but instead get
Because the node in the intermediate MDAST doesn't have just a single child:
The Solution
There are a couple of potential solutions here; one of them would be to avoid creating those unnecessary "paragraph" nodes in the first place, but another is toupgrade
mdast-util-to-hast
to be able to deal with this case as well. This latter solution is outlined in this PR.I still need to fix the tests that are affected by this change and write new ones for this new functionality, but before I invest that time I want to get your thoughts on the validity of pursuing this fix as opposed to the other one.