matthewwithanm / python-markdownify

Convert HTML to Markdown
MIT License
1.04k stars 135 forks source link

ensure paragraph start tags begin a paragraph #108

Open mirabilos opened 8 months ago

mirabilos commented 8 months ago

Fixes #92 and is the only remaining code change I have (as opposed to wrapping Markdownify)

AlexVonB commented 6 months ago

Hi! This breaks some working code, as there will be a few places with loads of empty lines, for example:

md('<blockquote><p>Hello</p><p>Hello again</p></blockquote>')

       > Hello
       > 
       > 
       > 
       > Hello again
mirabilos commented 6 months ago

AlexVonB dixit:

Hi! This breaks some working code, as there will be a few places with loads of empty lines, for example:

md('

Hello

Hello again

')

      > Hello
      > 
      > 
      > 
      > Hello again

Postprocess. It’s trivial, and easier to fix there than in Markdownify.

[…]
# convert and clean up
text = MarkdownConverter(strip=['img']).convert_soup(html)
text = re.sub('  \n  \n', '\n\n', '\n' + text + '\n')
text = re.sub('(\n> )+\n', '\n> \n', '\n' + text + '\n')
text = re.sub(' *\n\n+', '\n\n', text)
return text.strip()