Closed Tajmirul closed 1 year ago
I'm afraid I could not find where the problem is.
const vimeoHtml = parse(`<div class="clip_details-description description-wrapper iris_desc">
<p class="first">Country music legend, Trish Cotton, has something to say.</p>
<p>
Written by Kyle Kasabian (@kylekasabian) <br />
Directed by Derek Mari (@directorderek)<br />
Director of Photography: Peter Mickelsen<br />
Produced by Derek Mari and Kyle Kasabian<br />
Edited by Derek Mari
</p>
<p>Starring: Alyssa Sabo, Janine Hogan, and Kyle Kasabian</p>
<p>
Assistant Camera: Casey Schoch<br />
Production Sound: David Alvarez<br />
Production Assistant: Keith Ahlstrom
</p>
<p>Music by Morgan Matthews</p>
<p>
Blink & Miss Productions<br />
Bad Cat Films
</p>
</div>
</div>`);
const description = vimeoHtml.querySelector('.description-wrapper');
description.toString().should.eql('<ul id="list"><li><a href="#">Some link</a></li></ul>');
I'm not sure if this is exactly related, but this outputs "null
" for me for node-html-parser@6.1.4
and node version 17.4.0:
import { parse } from "node-html-parser";
console.log(
parse(
`<html><body><pre><code class="language-typescript">type Foo = { foo: 'bar' }</code></pre></body></html>`
).querySelector("code")
);
It seems like the bug is in the PRE
tag - there's an assumption that it can't have child nodes:
import { parse } from "node-html-parser";
const convert = root => ({
tag: root.tagName,
textContent: root.textContent,
children: [...root.childNodes].map(convert),
});
const tree = convert(
parse(`<html><body><pre><code class="language-typescript">type Foo = { foo: 'bar' }</code></pre></body></html>`)
);
console.log(JSON.stringify(tree, null, 2));
This outputs:
{
"tag": null,
"textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
"children": [
{
"tag": "HTML",
"textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
"children": [
{
"tag": "BODY",
"textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
"children": [
{
"tag": "PRE",
"textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
"children": [
{
"textContent": "<code class=\"language-typescript\">type Foo = { foo: 'bar' }</code>",
"children": []
}
]
}
]
}
]
}
]
}
@wolfie try this
parse(html, {
blockTextElements: {
script: true,
noscript: true,
style: true,
}
});
I am trying to fetch the title and description of a Vimeo video. I brought the HTML successfully. But I can't select the description
div
.Here is the code:
this code brings the HTML. The HTML contains a div with class
description-wrapper
.But when I try to select the div by
querySelector
it returns null.