Closed chantorak closed 1 year ago
That's not minimal HTML example.
const html = `
<h1>heading</h1>
<p>paragraph</p>
<h1>heading</h1>
<p>paragraph</p>
<h1>heading</h1>
<p>paragraph</p>`;
const options = { baseElements: { selectors: ['p', 'h1'] } };
const text = htmlToText(html, options);
console.log(text);
Output:
paragraph
paragraph
paragraph
HEADING
HEADING
HEADING
Front page of https://nutritionhappiness.com/
contains one <h1>
heading with the content you've provided and one <p>
paragraph which is empty.
<h1 class="t677__title t-title t-title_xs " field="title" style="font-size:66px;"><div style="line-height:68px;" data-customstyle="yes"><i>Health.<br>Wellness.<br>Happiness.<strong></strong></i></div></h1>
<p class="gm-style-mot"></p>
Not sure what content you observe when you remove h1
and only keep p
from selectors. If no base elements are found, default value for baseElements.returnDomByDefault
is true
and that will result in entire page being processed. But with one empty paragraph that should result in empty output.
You are probably doing something wrong and I can't tell you what exactly.
Typical cause of issues like this - wrong idea about actual input HTML content.
Thanks for the reply, there doesn't seem to be P tags, text are wrapped in div
Minimal HTML example
Options
Observed output
Expected output
The expected output to include P and h1 contents
Version information
If remove the h1, the output will included the P content, somehow the h1 is conflicting with the P