sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 217 forks source link

fix parsing of sub-lists, sub-sub-lists with code, para and blank lin… #170

Closed vsch closed 9 years ago

vsch commented 9 years ago

List parsing fixed

I also took the liberty of prettying up the <li> tags when they contain children and added a \n before printing of the <p> tag so that the generated HTML hierarchy is easier to follow.

Fixes issues: #57 and #123

List rendering beyond the first and second levels was off for all but compact items and in need of a little TLC.

The printout of the test follows. The raw markdown section gives the markdown source, the markdown section will be converted to HTML by GitHub, the HTML section is the output of the corrected pegdown parser and raw HTML is the parser output for those that prefer to render HTML in their head from source.

As you can see GitHub and pegdown are in agreement.

License

This fix is my original work and I share it under the same license as the pegdown licence.

Cheers, Vlaidimir.

P.S. Please run reformat on the source of the parser in IntelliJ so that it has consistent format and gets rid of the trailing spaces. Took a while to discard all the extraneous chunks that had trailing spaces removed.

-- raw markdown -----------------------------------------------


1. B 1

    1. A 1
    2. A 2

    B 1's paragraph

<!-- end list -->
2. XYZ 1

    XYZ 1's paragraph

    1. A 1
    2. A 2

    3. A 3

        A 3's paragraph

        - XYZ
            1. abc
            1. abc
            1. abc
        - XYZ
            - abc
            fenced code block of sub-sub-list abc                
            ```
        - abc
        - abc
    - XYZ

    Lorem ipsum...
  1. XYZ 2

    1. A 2
      • XYZ 2
  1. List 1 Item 1

    • sub list item 1 sub list item 1 continuation line

      sub list item 1 code block 1

      sub list item 1 paragraph

      sub list item 1 code block 2
    • sub list item 2

    List 1 Item 1 paragraph

    List 1 Item 1 code block

    List 1 Item 1 paragraph or main code block


-- markdown -----------------------------------------------
1. B 1
   1. A 1
   2. A 2

   B 1's paragraph

<!-- end list -->
1. XYZ 1

   XYZ 1's paragraph
   1. A 1
   2. A 2
   3. A 3

      A 3's paragraph
      - XYZ
        1. abc
        2. abc
        3. abc
      - XYZ
        - abc
      fenced code block of sub-sub-list abc                
      ```
    - abc
    - abc
  - XYZ

  Lorem ipsum...
  1. XYZ 2
    1. A 2
      • XYZ 2
  1. List 1 Item 1

    • sub list item 1 sub list item 1 continuation line

      sub list item 1 code block 1

      sub list item 1 paragraph

      sub list item 1 code block 2
    • sub list item 2

    List 1 Item 1 paragraph

    List 1 Item 1 code block

    List 1 Item 1 paragraph or main code block

-- HTML ---------------------------------------------------

  1. B 1

    1. A 1
    2. A 2

    B 1’s paragraph

  1. XYZ 1

    XYZ 1’s paragraph

    1. A 1
    2. A 2
    3. A 3

      A 3’s paragraph

      • XYZ
        1. abc
        2. abc
        3. abc
      • XYZ
        • abc

          fenced code block of sub-sub-list abc

        • abc
        • abc
      • XYZ

      Lorem ipsum…

  2. XYZ 2

    1. A 2
      • XYZ 2
  1. List 1 Item 1

    • sub list item 1
      sub list item 1 continuation line

      sub list item 1 code block 1
      

      sub list item 1 paragraph

      sub list item 1 code block 2
      
    • sub list item 2

    List 1 Item 1 paragraph

    List 1 Item 1 code block
    

    List 1 Item 1 paragraph or main code block

-- raw html -----------------------------------------------

<ol>
  <li>
    <p>B 1</p>
    <ol>
      <li>A 1</li>
      <li>A 2</li>
    </ol>
    <p>B 1’s paragraph</p>
  </li>
</ol>
<!-- end list -->
<ol>
  <li>
    <p>XYZ 1</p>
    <p>XYZ 1’s paragraph</p>
    <ol>
      <li>A 1</li>
      <li>A 2</li>
      <li>
        <p>A 3</p>
        <p>A 3’s paragraph</p>
        <ul>
          <li>XYZ
            <ol>
              <li>abc</li>
              <li>abc</li>
              <li>abc</li>
            </ol>
          </li>
          <li>XYZ
            <ul>
              <li>
                <p>abc</p>
                <p><code>fenced code block of sub-sub-list abc</code></p>
              </li>
              <li>abc</li>
              <li>abc</li>
            </ul>
          </li>
          <li>XYZ</li>
        </ul>
        <p>Lorem ipsum…</p>
      </li>
    </ol>
  </li>
  <li>
    <p>XYZ 2</p>
    <ol>
      <li>A 2
        <ul>
          <li>XYZ 2</li>
        </ul>
      </li>
    </ol>
  </li>
</ol>
<!-- end list -->
<ol>
  <li>
    <p>List 1 Item 1</p>
    <ul>
      <li>
        <p>sub list item 1<br/>sub list item 1 continuation line</p>
        <pre><code>sub list item 1 code block 1
</code></pre>
        <p>sub list item 1 paragraph</p>
        <pre><code>sub list item 1 code block 2
</code></pre>
      </li>
      <li>sub list item 2</li>
    </ul>
    <p>List 1 Item 1 paragraph</p>
    <pre><code>List 1 Item 1 code block
</code></pre>
    <p>List 1 Item 1 paragraph or main code block</p>
  </li>
</ol>

sirthias commented 9 years ago

Thanks Vladimir! Would you be able to add a test that fails without the patch and which this PR turns green?

vsch commented 9 years ago

I'll give it a go. I don't see it being that hard.

Best regards,

Vladimir (514) 244-4172

Sent from my iPhone

On Aug 13, 2015, at 8:03 AM, Mathias notifications@github.com wrote:

Thanks Vladimir! Would you be able to add a test that fails without the patch and which this PR turns green?

— Reply to this email directly or view it on GitHub.

sirthias commented 9 years ago

Thanks! We are obviously missing a test here.

vsch commented 9 years ago

There are other issues associated with this fix. Now compound lists are handled correctly but all sub-lists are compact unless they contain a paragraph. I have a fix for this and also fixed regression test failures because definitions and lists share rules.

The crux of the issue is that a list item's children can be an arbitrary set of indented blocks (one or more indents). All of these should be collected into a single chunk to be parsed recursively because its sub-items can also be interleaved with blank lines, code blocks, etc. If you break a list-item's child blocks into individual parts then one of its sub-list's items may not have the correct contents because the sub-list may be prematurely closed at the end of the recursive parsing of one of the text child blocks.

The list item rule now takes all following blocks that are part of its children and parses it recursively as one text stream. The results are much better. You get compact or loose items and regression tests pass for all except the pegdown.spec: AstText.ast. The issue is that you cannot include the trailing blank lines after a list item because the next item needs them to determine whether it is loose or tight. So a list item now takes leading blank lines but leaves trailing ones for the next item. This causes the blank line with two spaces at the end of the list in the test to be excluded from the BulletListNode leading to a discrepancy of 3 characters (2 spaces + \n). I changed the expectation file for the test to match the current results.

For now I would not go as far as testing all the combinations and permutations for lists because I suspect that other issues will be discovered. I simply don't have the time to fix them all as they arise or write all the tests. If I create tests that fail, I'll need to validate that they are not false negative, etc. As one co-worker of mine used to say: "It's a career decision." which I am not prepared to make. ;)

I separated the definition block list parsing from the list block parsing because the definition tests expect the old behaviour and I didn't get to validating that part of the parser. For now I am willing to take it at face value. The definitions rules are a copy of the old ListItem rule and dependent rules that diverge for lists.

I'll close this PR and open a new one with the fixes that pass the regression tests, but without adding new tests for the bug fixes. I will post another PR with new tests tomorrow or the day after.