tim-gromeyer / html2md

Transform your HTML into clean, easy-to-read markdown with html2md.
https://tim-gromeyer.github.io/html2md/
MIT License
24 stars 2 forks source link

Anchor tag without ending isn't handled correctly #101

Closed rsyring closed 5 months ago

rsyring commented 5 months ago

Describe the bug

When parsing text with an anchor tag that isn't closed:

To Reproduce

import pyhtml2md

html = """
<p>Some text<a href="http://example.com"/>the anchor should end but doesn't.  A lot more text to demonstrate that the wrapping is also affected.  Here it comes.  Ready or not.  </p>
"""

print(pyhtml2md.convert(html))

And the output is:

Some text[the anchor should end but doesn't.  A lot more text to demonstrate that the wrapping is also affected.  Here it comes.  Ready or not.  

Expected behavior

Firefox renders the HTML with the entire rest of the document inside the link. I think it makes more sense to stop at the end of the paragraph. So the output should look like:

Some text[the anchor should end but doesn't.  A lot more text to 
demonstrate that the wrapping is also affected.  Here it comes.  Ready or not.](http://example.com)

Desktop (please complete the following information):

Ubuntu 23.10

tim-gromeyer commented 5 months ago

The anchor closes itself, the correct output should be:

Some text[](http://example.com)the anchor should end but doesn't.  A lot more text to demonstrate that the wrapping is also affected.  Here it comes.  Ready or not. 

The handling of self-closing tags is fixed in the latest commit, so I'm closing this issue. Feel free to reopen thought!