kjdev / hoextdown

Hoextdown is an extension to Hoedown
MIT License
23 stars 15 forks source link

List items and fenced code do not follow commonmark spec #59

Open gagern opened 4 years ago

gagern commented 4 years ago

This is follow-up work as I investigate the impact of https://github.com/kjdev/hoextdown/pull/58 on a corpus of Markdown files.

Does Hoextdown aim to be compliant with spec.commonmark.org, or is compatibility with Markdown.PL more important? If it is the latter, is there a more useful specification of its syntax than the code itself? It would be useful to call the intended language dialect out in the README.md, and perhaps have at least an option to go with commonmark.

  1. As I just reported in https://github.com/hoedown/hoedown/issues/238, triple-quoted text inside a list item breaks list continuation.

  2. Also having a second space between list marker and a fenced code block will break indentation and the detection of the closing fence.

    1.  ```
       Code.

    and text.

    1. More items.

    This is in part a consequence of my https://github.com/hoedown/hoedown/issues/238 which probably gets the initial indent wrong, and I may be able to fix that. But even if the indent is wrong and the code sees an additional leading space for the closing fence, https://spec.commonmark.org/0.29/#fenced-code-blocks allows the closing code fence to be indented by up to three spaces, so it should detect it nonetheless.

    I can reproduce this problem outside lists: indenting a closing fence by a space breaks it.

  3. A lot of complexity inside parse_listitem seems to come from the fact that according to the commit message of https://github.com/kjdev/hoextdown/commit/0e1cb2b1570a975f16feff176f81766128d838e1 list items will continue if subsequent lines are indented by a single space. https://spec.commonmark.org/0.29/#list-items requires an indentation of as many spaces as there are characters before the start of the first list item text line, which I believe would be far easier to handle, and which should remove some of the ambiguities we otherwise probably get.

    Specifically I would hope that with the commonmark specification regarding indentation, it should be possible to avoid the in_fence flag, and determine the end of the list item simply by looking for the first non-empty line with fewer leading spaces than required for the list item.

  4. The handling of the in_fence flag inside parse_listitem does not take into account a mix of code fence symbols (` vs. ~), nor the number of fence symbols (closing fence needs (at least) as many as opening fence). It also doesn't account for the fact that according to https://spec.commonmark.org/0.29/#example-98 and the text leading up to that, an unmatched opening code fence will lead to a code block that extends to the end of the list item. Such a code block should not interfere with list continuation in a subsequent item, but as setting has_next_oli is suppressed by in_fence, that's what's happening in Hoextdown.

    In light of all of these, I'm inclined to consider the handling of in_fence inside parse_listitem to be deeply broken. I would be very happy if we could just get rid of that flag altogether, and detect the start of the next item independent of the state of any code blocks. But I assume from https://github.com/kjdev/hoextdown/commit/0e1cb2b1570a975f16feff176f81766128d838e1 that something about continuation lines with fewer spaces precludes that approach. I guess that is the case because the indent of the lines of a list item is not a fixed constant now, but may shift from one line to the other, except for code blocks where it has to stay the same for the duration of the code block.

  5. The preceding point also highlighted another difference between https://spec.commonmark.org/0.29/#fenced-code-blocks and Hoextdown: according to the latter a closing fence might have more fence characters than the opening one had. Hoextdown does require the number to be exactly equal.

  6. The list continuation logic in https://spec.commonmark.org/0.29/#lists requires use of the same bullet symbol for unordered lists, or the same number delimiter for ordered lists. But Hoextdown does not do that, so you get a single list even if you switch marker type. Since the code only looks at has_next_oli and has_next_uli, that's all it can do.

All in all I find it hard to tell what flavor of Markdown Hoextdown is aiming to support. If the code is the spec, then any change will break someone. If the spec is somewhere else, it should be cited. And if the spec is CommonMark, then Hoexdown should aim to pass all its test cases.

I feel that in any case, considerable changes to the interaction between fenced code and list items will be needed. Direction of those changes depends a lot on what behavior you're aiming for.