sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 217 forks source link

Fix for compound nested list issues, new tests + updated regression tests for corrected parser. #174

Closed vsch closed 9 years ago

vsch commented 9 years ago

Fix for compound nested list issues, tests updated to reflect fixes, new tests created.

License statement:

I am the original author of this fix and make it available under the same license as pegdown.

Fixes issues: #57, #123

This PR has the fixed list parsing that includes tight and loose sub-list handling that the previous fix did not address. It also does not mess up definitions because these relied on the old parser rule for list parsing. Definition list rule is now a copy of the old list parsing rule so that regression tests pass. I did not have the time nor the will to do extensive validation on definition lists. For now I assume they work correctly.

The previous version of the parser used to create nodes whose source range would go up to 2 chars past the end of the input buffer. The new one only does this if the input buffer does not have an EOL at the end, and will only go one character beyond so that the added EOL is included. It also used to gobble up blank lines that could potentially be needed for list items to determine their tightness. Now if a blank line may be needed by the following block it is not consumed but only tested for.

Some of the regression tests had to be modified because the new parsing fixed issues that the tests were erroneously expecting. Especially when a list sub-item, has a blank line above it and the tests expected it to be tight because of a bug in the previous version of the parser.

The biggest change is that now loose/tight list items are much more controllable. Put a blank line above a list item to make it loose, except the first one which always has a blank line above. To make the first list item loose, make the second list item loose. Sub-items no longer make their parent item loose. Additionally, the first sub-item will be made loose if it has a blank line above it. Results are very intuitive and follow the markdown source.

Another big change for anyone that uses the node source range values is that nodes do not go beyond the end of stream. It is especially evident on nested elements that are recursively parsed. They would tend to go 2 chars beyond their own text, with the discrepancy having a chance to multiply for deeper level of nesting.

-- raw markdown -------------------------------------------


##### CompoundLists/nested-one-level.md

1. Main Item 1
    1. List 2 Item 1
        1. Item 1
        Item 1 lazy continuation
        2. Item 2
            Item 2 continuation
        3. Item 3
                Item 3 deep indent continuation
        4. Item 4
        Item 4 lazy continuation
            Item 4 continuation
                Item 4 deep indent continuation
        1. Item 5
        Item 5 lazy continuation

            Item 5 paragraph 1
        Item 5 paragraph 1 lazy continuation
            Item 5 paragraph 1 continuation
                    Item 5 paragraph 1 deep indent continuation

                Item 5 code block 1
                    Item 5 code block 1 indented 1
                        Item 5 code block indented x2
                    Item 5 code block 1 indented 2
                Item 5 code block 1.2

            Item 5 paragraph 2
        Item 5 paragraph 2 lazy continuation
            Item 5 paragraph 2 continuation
                    Item 5 paragraph 2 deep indent continuation

                    Item 5 code block 2
                        Item 5 code block 2 indented 1
                            Item 5 code block indented x2
                        Item 5 code block 2 indented 2
                    Item 5 code block 2.2

        List 2 Item 1 paragraph 1
    List 2 Item 1 paragraph 1 lazy continuation
        List 2 Item 1 paragraph 1 continuation
                List 2 Item 1 paragraph 1 deep indent continuation

            List 2 Item 1 code block 1
                List 2 Item 1 code block 1 indented 1
                    List 2 Item 1 code block indented x2
                List 2 Item 1 code block 1 indented 2
            List 2 Item 1 code block 1.2

        List 2 Item 1 paragraph 2
    List 2 Item 1 paragraph 2 lazy continuation
        List 2 Item 1 paragraph 2 continuation
                List 2 Item 1 paragraph 2 deep indent continuation

                List 2 Item 1 code block 2
                    List 2 Item 1 code block 2 indented 1
                        List 2 Item 1 code block indented x2
                    List 2 Item 1 code block 2 indented 2
                List 2 Item 1 code block 2.2

    Main Item paragraph 1
Main Item paragraph 1 lazy continuation
    Main Item paragraph 1 continuation
            Main Item paragraph 1 deep indent continuation

        Main Item code block 1
            Main Item code block 1 indented 1
                Main Item code block indented x2
            Main Item code block 1 indented 2
        Main Item code block 1.2

    Main Item paragraph 2
Main Item paragraph 2 lazy continuation
    Main Item paragraph 2 continuation
            Main Item paragraph 2 deep indent continuation

            Main Item code block 2
                Main Item code block 2 indented 1
                    Main Item code block indented x2
                Main Item code block 2 indented 2
            Main Item code block 2.2

Regular Paragraph line 1
    Regular Paragraph indented line 2
        Regular Paragraph double indented line 3

-- markdown -----------------------------------------------

CompoundLists/nested-one-level.md
  1. Main Item 1

    1. List 2 Item 1

      1. Item 1 Item 1 lazy continuation
      2. Item 2 Item 2 continuation
      3. Item 3 Item 3 deep indent continuation
      4. Item 4 Item 4 lazy continuation Item 4 continuation Item 4 deep indent continuation
      5. Item 5 Item 5 lazy continuation

        Item 5 paragraph 1 Item 5 paragraph 1 lazy continuation Item 5 paragraph 1 continuation Item 5 paragraph 1 deep indent continuation

        Item 5 code block 1
         Item 5 code block 1 indented 1
             Item 5 code block indented x2
         Item 5 code block 1 indented 2
        Item 5 code block 1.2

        Item 5 paragraph 2 Item 5 paragraph 2 lazy continuation Item 5 paragraph 2 continuation Item 5 paragraph 2 deep indent continuation

         Item 5 code block 2
             Item 5 code block 2 indented 1
                 Item 5 code block indented x2
             Item 5 code block 2 indented 2
         Item 5 code block 2.2

      List 2 Item 1 paragraph 1 List 2 Item 1 paragraph 1 lazy continuation List 2 Item 1 paragraph 1 continuation List 2 Item 1 paragraph 1 deep indent continuation

      List 2 Item 1 code block 1
        List 2 Item 1 code block 1 indented 1
            List 2 Item 1 code block indented x2
        List 2 Item 1 code block 1 indented 2
      List 2 Item 1 code block 1.2

      List 2 Item 1 paragraph 2 List 2 Item 1 paragraph 2 lazy continuation List 2 Item 1 paragraph 2 continuation List 2 Item 1 paragraph 2 deep indent continuation

        List 2 Item 1 code block 2
            List 2 Item 1 code block 2 indented 1
                List 2 Item 1 code block indented x2
            List 2 Item 1 code block 2 indented 2
        List 2 Item 1 code block 2.2

    Main Item paragraph 1 Main Item paragraph 1 lazy continuation Main Item paragraph 1 continuation Main Item paragraph 1 deep indent continuation

    Main Item code block 1
       Main Item code block 1 indented 1
           Main Item code block indented x2
       Main Item code block 1 indented 2
    Main Item code block 1.2

    Main Item paragraph 2 Main Item paragraph 2 lazy continuation Main Item paragraph 2 continuation Main Item paragraph 2 deep indent continuation

       Main Item code block 2
           Main Item code block 2 indented 1
               Main Item code block indented x2
           Main Item code block 2 indented 2
       Main Item code block 2.2

Regular Paragraph line 1 Regular Paragraph indented line 2 Regular Paragraph double indented line 3

-- HTML ---------------------------------------------------

CompoundLists/nested-one-level.md
  1. Main Item 1
    1. List 2 Item 1
      1. Item 1
        Item 1 lazy continuation
      2. Item 2
        Item 2 continuation
      3. Item 3
        Item 3 deep indent continuation
      4. Item 4
        Item 4 lazy continuation
        Item 4 continuation
        Item 4 deep indent continuation
      5. Item 5
        Item 5 lazy continuation

        Item 5 paragraph 1
        Item 5 paragraph 1 lazy continuation
        Item 5 paragraph 1 continuation
        Item 5 paragraph 1 deep indent continuation

        Item 5 code block 1
            Item 5 code block 1 indented 1
                Item 5 code block indented x2
            Item 5 code block 1 indented 2
        Item 5 code block 1.2
        

        Item 5 paragraph 2
        Item 5 paragraph 2 lazy continuation
        Item 5 paragraph 2 continuation
        Item 5 paragraph 2 deep indent continuation

            Item 5 code block 2
                Item 5 code block 2 indented 1
                    Item 5 code block indented x2
                Item 5 code block 2 indented 2
            Item 5 code block 2.2
        

      List 2 Item 1 paragraph 1
      List 2 Item 1 paragraph 1 lazy continuation
      List 2 Item 1 paragraph 1 continuation
      List 2 Item 1 paragraph 1 deep indent continuation

      List 2 Item 1 code block 1
          List 2 Item 1 code block 1 indented 1
              List 2 Item 1 code block indented x2
          List 2 Item 1 code block 1 indented 2
      List 2 Item 1 code block 1.2
      

      List 2 Item 1 paragraph 2
      List 2 Item 1 paragraph 2 lazy continuation
      List 2 Item 1 paragraph 2 continuation
      List 2 Item 1 paragraph 2 deep indent continuation

          List 2 Item 1 code block 2
              List 2 Item 1 code block 2 indented 1
                  List 2 Item 1 code block indented x2
              List 2 Item 1 code block 2 indented 2
          List 2 Item 1 code block 2.2
      

    Main Item paragraph 1
    Main Item paragraph 1 lazy continuation
    Main Item paragraph 1 continuation
    Main Item paragraph 1 deep indent continuation

    Main Item code block 1
        Main Item code block 1 indented 1
            Main Item code block indented x2
        Main Item code block 1 indented 2
    Main Item code block 1.2
    

    Main Item paragraph 2
    Main Item paragraph 2 lazy continuation
    Main Item paragraph 2 continuation
    Main Item paragraph 2 deep indent continuation

        Main Item code block 2
            Main Item code block 2 indented 1
                Main Item code block indented x2
            Main Item code block 2 indented 2
        Main Item code block 2.2
    

Regular Paragraph line 1
Regular Paragraph indented line 2
Regular Paragraph double indented line 3


-- raw HTML -----------------------------------------------

<h5>CompoundLists/nested-one-level.md</h5>
<ol>
  <li>Main Item 1
    <ol>
      <li>List 2 Item 1
        <ol>
          <li>Item 1<br/>Item 1 lazy continuation</li>
          <li>Item 2<br/>Item 2 continuation</li>
          <li>Item 3<br/> Item 3 deep indent continuation</li>
          <li>Item 4<br/>Item 4 lazy continuation<br/>Item 4 continuation<br/> Item 4 deep indent continuation</li>
          <li>Item 5<br/>Item 5 lazy continuation
            <p>Item 5 paragraph 1<br/>Item 5 paragraph 1 lazy continuation<br/>Item 5 paragraph 1 continuation<br/> Item 5 paragraph 1 deep indent continuation</p>
            <pre><code>Item 5 code block 1
    Item 5 code block 1 indented 1
        Item 5 code block indented x2
    Item 5 code block 1 indented 2
Item 5 code block 1.2
</code></pre>
            <p>Item 5 paragraph 2<br/>Item 5 paragraph 2 lazy continuation<br/>Item 5 paragraph 2 continuation<br/> Item 5 paragraph 2 deep indent continuation</p>
            <pre><code>    Item 5 code block 2
        Item 5 code block 2 indented 1
            Item 5 code block indented x2
        Item 5 code block 2 indented 2
    Item 5 code block 2.2
</code></pre>
          </li>
        </ol>
        <p>List 2 Item 1 paragraph 1<br/>List 2 Item 1 paragraph 1 lazy continuation<br/>List 2 Item 1 paragraph 1 continuation<br/> List 2 Item 1 paragraph 1 deep indent continuation</p>
        <pre><code>List 2 Item 1 code block 1
    List 2 Item 1 code block 1 indented 1
        List 2 Item 1 code block indented x2
    List 2 Item 1 code block 1 indented 2
List 2 Item 1 code block 1.2
</code></pre>
        <p>List 2 Item 1 paragraph 2<br/>List 2 Item 1 paragraph 2 lazy continuation<br/>List 2 Item 1 paragraph 2 continuation<br/> List 2 Item 1 paragraph 2 deep indent continuation</p>
        <pre><code>    List 2 Item 1 code block 2
        List 2 Item 1 code block 2 indented 1
            List 2 Item 1 code block indented x2
        List 2 Item 1 code block 2 indented 2
    List 2 Item 1 code block 2.2
</code></pre>
      </li>
    </ol>
    <p>Main Item paragraph 1<br/>Main Item paragraph 1 lazy continuation<br/>Main Item paragraph 1 continuation<br/> Main Item paragraph 1 deep indent continuation</p>
    <pre><code>Main Item code block 1
    Main Item code block 1 indented 1
        Main Item code block indented x2
    Main Item code block 1 indented 2
Main Item code block 1.2
</code></pre>
    <p>Main Item paragraph 2<br/>Main Item paragraph 2 lazy continuation<br/>Main Item paragraph 2 continuation<br/> Main Item paragraph 2 deep indent continuation</p>
    <pre><code>    Main Item code block 2
        Main Item code block 2 indented 1
            Main Item code block indented x2
        Main Item code block 2 indented 2
    Main Item code block 2.2
</code></pre>
  </li>
</ol>
<p>Regular Paragraph line 1<br/> Regular Paragraph indented line 2<br/> Regular Paragraph double indented line 3</p>

vsch commented 9 years ago

Add fix for #157 and associated test for this particular case.

1111
```c
int main(){
    retirm 0;
}

Before the fix:

``` html
<p>1111 <code>c
int main(){
    retirm 0;
}
</code></p>

After the fix:


<p>1111</p>
<pre><code class="c">int main(){
    retirm 0;
}
</code></pre>
sirthias commented 9 years ago

This is an awesome patch! Thank you VERY MUCH, Vladimir!

vsch commented 9 years ago

You are very welcome.

At first it looked daunting. In the end it was satisfying. As a side effect I now feel very comfortable with making extensions and fixes to pegdown.

To think that it all started with wanting a GitHub like preview for Markdown in PhpStorm. The only available plugin was nicoulaj/idea-markdown but it looked awful and was a small annoyance of mine for months. I finally decided that I'll take the time to change its style sheet to make it look more like GitHub. How hard could that be?

I wound up upgrading it to latest pegdown, parboiled, fixed a few bugs and finished by releasing it as a separate plugin. The original hasn't been updated for over a year, the PR's seem to be ignored and users were asking for what I already added and fixed.

When I stumbled on the parsing bug in pegdown, I tried to "live with it" but it's not in my nature, so I rolled up the sleeves and dove in. I am glad I did.

If you use an IntelliJ IDE you might want to check it out vsch/idea-multimarkdown or on the plugins page MultiMarkdown. It is much closer to GitHub's look, you can customize the CSS in settings, it also has an HTML Text tab so you can see the generated HTML.

Thank you for a great package.

sirthias commented 9 years ago

Cool. Thanks for this description of your path. What you are describing regarding @nicoulaj's plugin in probably a somewhat natural lifecycle path of open source projects (parboiled and pegdown very much included). For some time the original author has the motivation and capacity to push and maintain the projects but at some points things (priorities, capacity, etc.) change and the projects becomes stale, to a varying degree.

What is needed then are people from the community (like you) that are motivated to contribute and sometimes even take over a product completely.

Thanks again!

vsch commented 9 years ago

I thought along those lines too. I can understand someone getting fed up once the thrill wears off and it becomes just grunt work.

Best regards,

Vladimir.

On Aug 18, 2015, at 10:53 AM, Mathias notifications@github.com wrote:

Cool. Thanks for this description of your path. What you are describing regarding @nicoulaj https://github.com/nicoulaj's plugin in probably a somewhat natural lifecycle path of open source projects (parboiled and pegdown very much included). For some time the original author has the motivation and capacity to push and maintain the projects but at some points things (priorities, capacity, etc.) change and the projects becomes stale, to a varying degree.

What is needed then are people from the community (like you) that are motivated to contribute and sometimes even take over a product completely.

Thanks again!

— Reply to this email directly or view it on GitHub https://github.com/sirthias/pegdown/pull/174#issuecomment-132238761.