bug/feature html tags wrapped in paragraph tags

clach04 commented 5 months ago

Describe the bug

html tags are being wrapped in <p> in a similar fashion to https://daringfireball.net/projects/markdown/

To Reproduce

problem.md

<html>

<!-- Comment please ignore -->

<!-- Multi
line Comment 
please ignore -->

<body>
content here

 <img src="some_img.jpg" alt="there is supposed to be an image here" width="500" height="600"> 

Now some bullets:

  * one
  * two

</body>

</html>

Output

NOTE html, body, and img tags:

<p><html></p>

<!-- Comment please ignore -->

<!-- Multi
line Comment 
please ignore -->

<p><body>
content here</p>

<p><img src="some_img.jpg" alt="there is supposed to be an image here" width="500" height="600"> </p>

<p>Now some bullets:</p>

<ul>
<li>one</li>
<li>two</li>
</ul>

<p></body></p>

<p></html></p>

Expected behavior

I'm seeking clarity on this from project readme in https://github.com/trentm/python-markdown2/blob/ac7fd196e2d8854e2f0f85c7b3c82462d83b7af1/README.md?plain=1#L11

It was written to closely match the behaviour of the original Perl-implemented Markdown.pl.

I was NOT expecting to see <p> tags around html, body. I have a larger example where it adds them around comments AND style tags.

Debug info Version of library being used:

__version_info__ = (2, 4, 14)

Any extras being used: NONE

Additional context

Other implementations generate what I consider sane html BUT differ from Daring Fireball 2004 Perl version (which is known to have odd issues).

I've not attempted to debug. Curious what thoughts are on this. Thanks!

nicholasserra commented 5 months ago

Looks like you left most of the default text in this issue. So not sure what issue you're seeing. Maybe close and reopen with clearer info.

clach04 commented 5 months ago

@nicholasserra looks like I missed off indentation in the first markdown snippet. Corrected. Hopefully that's more clear now.

clach04 commented 5 months ago

@nicholasserra I can't reopen this. Did you mean I should open a new issue?

Crozzers commented 4 months ago

I've opened a PR to fix this.

A side effect of the fix is that it won't automatically process the contents of the HTML tags as markdown. It will assume it's HTML and stop there. To get around this you'll need to use the markdown-in-html extra.

Adding markdown="1" to the html and body tag and enabling the extra should do the trick

text = '''
<html markdown="1">
...
<body markdown="1">
...
* one
* two
</body>
</html>
'''
markdown2.markdown(text, extras=['markdown-in-html'])

trentm / python-markdown2

bug/feature html tags wrapped in paragraph tags #575