<html>
<body>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
</body>
</html>
When HTML with indented block elements is converted, the indent causes incorrect formatting in the output.
Converting this indented <p> element:
from markdownify import markdownify as md
print(repr(md("""\
<p>This is
some text.</p>
""")))
produces this:
' This is\n some text.\n\n\n'
^ ^^^
It happens for non-<p> elements too. Converting these indented <h1> elements with the UNDERLINED and ATX heading formats:
As a workaround, we iterate through all text object descendants in all text-containing block elements (<p>, <entry>, <li>, etc.) and convert newlines to spaces, but this is expensive on large document sets.
In our HTML, block elements are indented:
When HTML with indented block elements is converted, the indent causes incorrect formatting in the output.
Converting this indented
<p>
element:produces this:
It happens for non-
<p>
elements too. Converting these indented<h1>
elements with theUNDERLINED
andATX
heading formats:produces this:
As a workaround, we iterate through all text object descendants in all text-containing block elements (
<p>
,<entry>
,<li>
, etc.) and convert newlines to spaces, but this is expensive on large document sets.Possibly related to #31.