Closed taylorotwell closed 4 years ago
NB Expected output is:
<table></table>
<h2>Heading</h2>
This behavior does seem to be correct per the CommonMark spec. <table></table>
is a type 6 HTML block per the spec which only ends when there's a completely blank line after it:
Example 131 in the spec demonstrates this:
It seems that all CommonMark-compliant parsers do follow this same behavior: https://babelmark.github.io/?text=%3Ctable%3E%3C%2Ftable%3E%0A%23%23+Heading
Inserting a blank line between the HTML block and the heading will produce the expected output that Graham shared: https://babelmark.github.io/?text=%3Ctable%3E%3C%2Ftable%3E%0A%0A%23%23+Heading
Is there an easy way to turn off this behaviour, so we can get something closer to how Parsedown handled blocks?
None that I can think of, unfortunately :-/
If you wanted to try and modify that behavior, the key to doing so would be these two methods in HtmlBlock
:
isCode()
determines whether or not that block should be "greedy" and attempt to include any subsequent lines instead of parsing them as new blocks. matchesNextLine()
ultimately determines whether or not the next line is a continuation of the current HtmlBlock
or whether something else should try to parse it.
To help illustrate some of the difficulty, imagine the logic you'd need to properly handle these cases:
<pre>
# This new line looks like Markdown, but it's instead a pre tag
</pre>
<pre>
# This is similar, but there's no closing tag
So where do you draw the line between where the HTML block opens and where it ends?
Did the HTML end with the pre tag?
Does it end here because there's a blank line?
What if several lines later there was a new tag?
<hr>
Does the HTML continue down to here?
<pre>What about here?</pre>
<pre>
# What if the HTML is malformed?
<pre>
Would you consider this line to be in the original HTML block, a new HTML block, or plain Markdown?
<p>
Should *this* line be parsed as HTML? It's not on the same line as the paragraph tag.
</p>
<p>
If you said that example above would not be an HTML block, then what about this one?</p>
<p>
Or *this* one?</p>
To quote the author of the CommonMark spec:
Given that we're not doing a full HTML parse, there are always going to be cases where the spec doesn't "do the right thing" as a human would judge it.
That's why the spec ultimately settled on requiring a blank line to separate them - of all the possible ways to differentiate between HTML and Markdown content, this is arguably the simplest to implement and understand.
I'm definitely open to any ideas you might have, I'm just struggling to identify an easy solution that would:
Is there any chance that something could be done on the Laravel side to add newlines after (or inside of) a component if it knows Markdown is being used? I apologize for not being as familiar with Laravel as I'd like to be, I'm just wondering whether producing "correct" CommonMark that this library would handle "correctly" might be a viable way to handle this particular case.
I think I can just work around it on our end for now. Thanks.
Thanks for the detailed replies.
Version(s) affected: ^1.x
Description
A simple heading tag does not parse correctly if it is on a line following an HTML tag. I see nothing in the CommonMark spec that states this should not work. In fact, the spec explicitly states that headings may be sandwiched in between other content with no blank lines in between.
How to reproduce
Output:
Additional context
Noticed by reports of broken behavior in Laravel Markdown mailable objects.