sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 218 forks source link

Nested lists are broken #245

Open weavejester opened 7 years ago

weavejester commented 7 years ago

Pegdown 1.6.0 has problems rendering nested lists. This issue also occurs in 1.5.0, 1.4.0 and 1.3.0, so it appears to be a long-standing issue.

* foo
  * bar
<ul>
  <li>foo</li>
  <li>bar</li>
</ul>

Similarly:

* foo

  bar
<ul>
  <li>foo</li>
</ul>
<p>bar</p>

However, if I double the indentation, it works:

* foo
    * bar
<ul>
  <li>foo
    <ul>
      <li>bar</li>
    </ul>
  </li>
</ul>

Looking at the code, it seems like Pegdown treats indentation as either a tab or four spaces, but for lists any whitespace should be treated as indentation.

vsch commented 7 years ago

@weavejester, pegdown uses fixed indent based list parsing, like MultiMarkdown and pandoc. Your list would be parsed as a list in kramdown, markdown.pl and CommonMark. Differences in list parsing is the greatest deviation between implementations.

weavejester commented 7 years ago

Ah, I see. Maybe the title shouldn't be "Markdown lists are broken", but "Markdown lists don't work like they do on Github" :)

vsch commented 7 years ago

@weavejester, if you need GitHub like processing then you need to clarify GitHub comments or GitHub docs. They switched comment processing to CommonMark. Docs are still kramdown.

I rewrote commonmark-java to replace pegdown in my Markdown Navigator plugin for IntelliJ IDEs: https://github.com/vsch/idea-multimarkdown. The parser project is https://github.com/vsch/flexmark-java has very detailed source based AST with source offset for every part of the element. I need that for syntax highlighting and other plugin source reliant features.

It is CommonMark 0.27 (GitHub Comments) compliant but has parser configuration options to emulate list indentation rules used by: markdown.pl, MultiMarkdown (like pegdown 4 space indents) and kramdown (GitHub Docs). The only extensions that pegdown has that I did not yet implement are: typographic quotes, smarts and definition lists. The rest of the extensions are available, with some extra ones that pegdown does not have.

As an added bonus and what motivated me to switch the parsing is 30-50x faster than pegdown on average documents and several thousand times faster on pegdown's pathological input like [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[.

The AST offsets are bug free and regular. It is also fully modifiable unlike pegdown's with next, prev and parent links.

craneyuan commented 7 years ago

@weavejester thank you very much! :+1: I also encountered this problem, just wondering how to deal with, and i am very lucky to find this issue