Closed benblank closed 1 year ago
Apologies for not providing a runnable example, but I spent more time trying (and failing) to get codesandbox to do something useful than I did on the rest of the report. 😅
Thanks @benblank! Here is the repro in a sandbox https://stackblitz.com/edit/node-mneiet?file=index.js I'm seeing the same behavior you describe when running remark 15.0.1
Checking the four examples in CommonMark Dingus
It does indeed appear all four should produce a list
Tracing further.
I suspect the issue is down one level in micromark
, I'm able to replicate the issue without having the AST generated https://stackblitz.com/edit/node-1ygk3h?file=index.js
Hi! This was marked as ready to be worked on! Note that while this is ready to be worked on, nothing is said about priority: it may take a while for this to be solved.
Is this something you can and want to work on?
Team: please use the area/*
(to describe the scope of the change), platform/*
(if this is related to a specific one), and semver/*
and type/*
labels to annotate this. If this is first-timers friendly, add good first issue
and if this could use help, add help wanted
.
I suspect the issue is down one level in micromark, I'm able to replicate the issue without having the AST generated
Ah! Dang. I'd traced it this far down from Prettier and thought I'd gotten to the bottom of it. 🙂
Thanks for all the helpful links!
I do think the spec is unclear for this:
In order to solve of unwanted lists in paragraphs with hard-wrapped numerals, we allow only lists starting with `1` to interrupt paragraphs. Thus,~
(right above example 304). As in, I followed those words here.
I think that the current behavior is in line with the reasoning there. Natural language phrases might include 1.
, but 2.
or 01.
are more unlikely.
If you care strongly about this, could you perhaps open an issue with commonmark/commonmark-spec
to check what the idea is?
Actually, I missed that when I was reading through the spec. I'm not sure I 100% agree with the reasoning behind it, but those reasons do at least appear to be pretty clear.
I may indeed open up an issue with regards to the phasing, though; I feel the section you quoted would be improved by calling out that it's only referring to ordered lists and to the markers 1.
and 1)
(not the character 1
), even if there are examples demonstrating both cases. The emphasis on the principle of uniformity also suggests that the exception applies to nested lists as well, but I don't see text or an example calling that out.
I also have to admit to being a bit surprised to see "interrupting, not starting with 1
" called out as not being valid, simply because when I was checking BabelMark, a large number of the parsers (including nine of the twelve marked as specifically targeting CommonMark) considered it valid.
On the one hand, it's a shame to "disagree" with so many other implementations, but the spec is clear as to what the Right Thing is, and it isn't what I was trying to do. I'll go ahead and close the issue.
Thanks for taking the time to look into this!
Hi! This was closed. Team: If this was fixed, please add phase/solved
. Otherwise, please add one of the no/*
labels.
Hi team! Could you describe why this has been marked as wontfix?
Thanks, — bb
Hi team! I don’t know what’s up as there’s no phase label. Please add one so I know where it’s at.
Thanks, — bb
There’s a wide variety of parser that all do things differently.
CM likes to be ambiguous on all the edge cases. This also comes as a given when it’s mostly a test suite of input/output examples, and not an explanation of an algorithm (such as HTML).
I’d like a more formal spec. But I can see value in this too.
Anyway, feel free to PR to the spec another example of the 01
case. Then I (and others) will go with the one that’s decided for that!
Initial checklist
Affected packages and versions
remark-parse@11.0.0
Link to runnable example
No response
Steps to reproduce
In a new folder, create a new Node module by running e.g.
pnpm init
.Run
pnpm install remark-parse@11.0.0
.Run
pnpm install unified@11.0.3
.Save the code below as
repro.mjs
file and runnode repro.mjs
. (I used Node v18.17.0.)This will generate a JSON file containing the parsed AST (sans
position
properties, so that they can be easily diffed) for each of the Markdown snippets it contains.
``` js import { writeFile } from "node:fs/promises"; import remarkParse from "remark-parse"; import { unified } from "unified"; const parser = unified().use(remarkParse).freeze(); const documents = { noLeadingZeroesFollowing: `The preceeding paragraph. 1. one 4. two `, leadingZeroesFollowing: `The preceeding paragraph. 01. one 02. two `, noLeadingZeroesInterrupting: `The preceeding paragraph. 1. one 2. two `, leadingZeroesInterrupting: `The preceeding paragraph. 01. one 02. two `, }; function stripPositions(node) { const { position, children, ...rest } = node; return { ...rest, children: children?.map(stripPositions) }; } await Promise.all( Object.entries(documents).map(([name, text]) => writeFile( name + ".json", JSON.stringify(stripPositions(parser.parse(text)), undefined, 2), ), ), ); ```repro.mjs
Observe that the files
noLeadingZeroesFollowing.json
,leadingZeroesFollowing.json
, andnoLeadingZeroesInterrupting.json
are identical and that their root nodes contain both a paragraph node and a list node. However, the root node inleadingZeroesInterrupting.json
instead contains only a single paragraph node. Diffing it against any of the other files will produce output similar to the following.
``` diff --- noLeadingZeroesFollowing.json 2023-10-13 16:11:26.261286672 -0700 +++ leadingZeroesInterrupting.json 2023-10-13 16:11:26.261286672 -0700 @@ -6,47 +6,7 @@ "children": [ { "type": "text", - "value": "The preceeding paragraph." - } - ] - }, - { - "type": "list", - "ordered": true, - "start": 1, - "spread": false, - "children": [ - { - "type": "listItem", - "spread": false, - "checked": null, - "children": [ - { - "type": "paragraph", - "children": [ - { - "type": "text", - "value": "one" - } - ] - } - ] - }, - { - "type": "listItem", - "spread": false, - "checked": null, - "children": [ - { - "type": "paragraph", - "children": [ - { - "type": "text", - "value": "two" - } - ] - } - ] + "value": "The preceeding paragraph.\n01. one\n02. two" } ] } ```repro.diff
Expected behavior
Ordered lists should be parsed consistently, regardless of whether their list markers have leading zeroes or the list interrupts a block.
Actual behavior
Ordered lists are recognized as such if their list markers have leading zeroes or they interrupt a block. However, ordered lists are not recognized as such if their list markers have leading zeroes and they interrupt a block.
Runtime
Other (please specify in steps to reproduce)
Package manager
pnpm
OS
Linux
Build and bundle tools
Other (please specify in steps to reproduce)