sirthias / pegdown

A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
http://pegdown.org
Apache License 2.0
1.29k stars 218 forks source link

strange behavior if you start with html tag #215

Closed mrothenbuecher closed 6 years ago

mrothenbuecher commented 8 years ago

If I start the markdown with an Html Tag nothing seems to be interpreted

PegDownProcessor proc = new PegDownProcessor();
// works as expected
String output = proc.markdownToHtml("Test\n-----");
System.out.println("1:"+output+" equals <h2>Test</h2> "+output.equals("<h2>Test</h2>")+"\n");
// doesn't work
output = proc.markdownToHtml("<div>\n"
                + "Test\n"
                + "-----\n"
                + "</div>");
System.out.println("2:"+output+" equals <div><h2>Test</h2></div> "+output.equals("<div><h2>Test</h2></div>")+"\n");

At least <h2>Test</h2> should be generated in the second example.

vsch commented 8 years ago

@mkuerbis, it is the correct behavior for a markdown processor. Markdown interprets the second example as an HTML block, which means text inside is passed through as is. There is no markdown processing inside an HTML block like there is inside an inline HTML.

What you want is achieved with:

String output = "<div>" + proc.markdownToHtml("Test\n-----") + "</div>";

You have to wrap the markdown output with HTML block elements outside of markdown processing.

mrothenbuecher commented 8 years ago

@vsch thank you for your statement.

The actual reason why I was asking is this example:

FOO
<div>
Test
------
</div>

generates

<p>FOO <div></p>
<h2>Test</h2>
<p></div></p>

I am playing around with markdown in a project where I want to make it possible to customize the way how search results are displayed. So in my application it would be necessary have markdown inside html.

vsch commented 8 years ago

@mkuerbis, again this is markdown "peculiarities". The generated HTML is not correct. The div tag is not closed in the first paragraph and the closing div in the second paragraph is unmatched by an opening one.

The reason this happens is that the leading text starts a paragraph block and the div is interpreted as a continuation of the first line because there is no blank link before it, so it is treated as inline HTML. Then the header is seen which ends the paragraph causing the div to be left open.

If you start with a div then it is treated as an HTML block as opposed to inline HTML and then there is no markdown processing inside.

What you are looking for is a way to have markdown processed in an HTML block. Some markdown processors have this ability through attribute parsing in the HTML, like <div markdown="true"> or something like it. Pegdown does not have this option at this time.