thephpleague / html-to-markdown

Convert HTML to Markdown with PHP
MIT License
1.77k stars 204 forks source link

Incorrect markdown when text is not in a tag, but between two tags #216

Closed multiwebinc closed 2 years ago

multiwebinc commented 2 years ago

Version(s) affected

5.0.2

Description

Text not within a tag has weird behavior when it is between two tags. The text is combined into the text for the tag after it.

How to reproduce

HTML:

<h1>Heading one</h1>

Some text

<h2>Heading two</h2>

Output:

Heading one
===========

 Some text Heading two
-----------

HTML:

<h1>Heading one</h1>

Some text

<h3>Heading two</h3>

Output:

Heading one
===========

 Some text ### Heading two

However this works correctly:

Some text

<h3>Heading</h3>

Output:

Some text

### Heading

This appears to happen for any <tag></tag> Text <tag></tag> combination that I've tried.

I believe this is important because someone could be using line breaks instead of paragraphs since it is visually similar:

<div>
  <h1>Document header</h1>

  Paragraph 1<br><br>

  Paragraph 2<br><br>

  <h2>Another header</h2>
</div>
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

multiwebinc commented 2 years ago

This should probably be reopened.

trymeouteh commented 7 months ago

I get the same results as well.

trymeouteh commented 7 months ago

Turns out a settings needs to be changed to get the header syntax we want

https://github.com/thephpleague/html-to-markdown?tab=readme-ov-file#style-notes