thephpleague / html-to-markdown

Convert HTML to Markdown with PHP
MIT License
1.77k stars 205 forks source link

White space gets eaten between tag-less paragraphs #176

Closed Rarst closed 5 years ago

Rarst commented 5 years ago

I am converting WordPress content, which might not have paragraphs wrapped in actual p tags. So it's effectively already markdown-like in this regard.

On conversion this style gets mashed together into a single long paragraph:

$input = '
First "paragraph".

Second "paragraph".

Third "paragraph".
';

var_dump( $input, (new HtmlConverter())->convert($input) );
string(67) "
First "paragraph".

Second "paragraph".

Third "paragraph".
"
string(57) "First "paragraph". Second "paragraph". Third "paragraph"."

Is there any setting or adjustment I could do to preserve such input as markdown paragraphs in the result?

Thanks in advance. :)

colinodell commented 5 years ago

I'm not aware of any existing setting. Perhaps you could try replacing all instances of \n\n with <br> first so that this library adds the line breaks back in?

Rarst commented 5 years ago

Cheers, after tinkering I got some progress preparing input with nl2br()... but I am guessing the only way to properly correct it is to adjust WP export.