vmg / sundown

Standards compliant, fast, secure markdown processing library in C
1.99k stars 385 forks source link

Markdown incorrectly processing HTML #134

Open dkharrat opened 11 years ago

dkharrat commented 11 years ago

Not sure if this is another bug that popped up after the fix for issue #96, but looks like I found another bug in processing inline HTML. The following HTML snippet reproduces the bug:

This line is correctly interpreted as regular markdown

<div>
<div>
test
</div>

<h2>Header</h2>
<ol>
   <li>
      <p>First paragraph</p>
   </li>
   <li>
      <p>Second paragaph</p>
   </li>
</ol>
</div>

generates:

<p>This line is correctly interpreted as regular markdown</p>

<div>
<div>
test
</div>

<p><h2>Header</h2>
<ol>
   <li>
      <p>First paragraph</p>
   </li>
   <li>
      <p>Second paragaph</p>
   </li>
</ol>
</div></p>

Notice that an extra <p> tag is generated before the <h2> tag and after the last </div> tag, resulting in an invalid HTML.

dkharrat commented 11 years ago

Any update on this one?

mildsunrise commented 9 years ago

For the record: this isn't a bug, it's a consequence of how Sundown looks for the ending tag.

In order to tell Sundown where your tag really ends, you should indent the HTML, or at least the first </div>:

This line is correctly interpreted as regular markdown

<div>
<div>
test
  </div>

<h2>Header</h2>
<ol>
   <li>
      <p>First paragraph</p>
   </li>
   <li>
      <p>Second paragaph</p>
   </li>
</ol>
</div>
dkharrat commented 9 years ago

Good to know about the workaround. But from the perspective of the library user, I'd consider this a bug as valid markdown input is producing invalid html output. Shouldn't Sundown just not process markdown inside HTML tags instead of relying on indentation? According to the Markdown spec:

Note that Markdown formatting syntax is not processed within block-level HTML tags. E.g., you can’t use Markdown-style emphasis inside an HTML block.

mildsunrise commented 9 years ago

Well, that's actually from the spec. The spec says the ending tag should be on an unindented line, and be followed by empty lines.

So, Sundown searches for the first </div> tag matching these requeriments (which is the wrong tag, but Sundown doesn't actually parse the HTML).