Closed AlenToma closed 2 years ago
I did some test on this, and it seems went very well.
const { parse } = require('../dist');
describe('queryselector', function () {
it('shoud query one node', function () {
const content = `<section>
<section>
<div class="column">foo</div>
</section>
</section>`;
const root = parse(content);
const div = root.querySelector('section > .column');
div.innerHTML.should.eql('foo');
const list = root.querySelectorAll('section > .column');
list.length.should.eql(1);
const div2 = list[0];
div2.should.eql(div);
});
});
Hmm could it be I am using an older version ? I am using Version 2.2.1
This is the code I am using.
import { parse } from 'node-html-parser';
export default class httpClient {
static async getHtml(
url: string
): Promise<HTMLDivElement> {
console.log(`Sending html request to ${url}`);
var container = parse('<div>test</div>') as any;
try {
let headers = new Headers({
Accept: '*/*',
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
});
var data = await httpClient.fetchWithTimeout(url, {
timeout: 30000,
headers: headers,
method: 'GET'
});
if (data.status === 1020) {
const message = `An error has occured:${data.status}`;
console.log(message);
}
else
if (!data.ok) {
const message = `An error has occured:${data.status}`;
console.log(message);
} else {
console.log('Data is ok. proceed to parse it');
var html = await data.text();
html = html.replace(/<!DOCTYPE html>/g, "");
container = parse('<div>' + html + '</div>');
console.log("Data has been parsed");
}
} catch (e) {
console.log(e);
}
return container;
}
}
And then executing
var container = await HttpClient.getHtml(url);
var items = Array.from(container.querySelectorAll("section > .column")) // which I do not get any result here.
If I do this I will get result, but will also get unwanted result herkie that I am not interested in
var items = Array.from(container.querySelectorAll("section .column"))
I am using this in react-native
project.
Ok I saw this comment now
Note: Full css3 selector supported since v3.0.0.
Will close this issue and upgrade to the latest version.
@taoqf I'm seeing this in the latest 5.4.2-0
version. I get empty results for .querySelectorAll("a a")
or .querySelectorAll("a > a")
if such elements exist.
ah I see what's happening. A DOM like
<a href="/test">Lorem Ipsum <a href="/foo"><span>bar</span></a></a>
is being changed to
<a href="/test">Lorem Ipsum </a> <a href="/foo"><span>bar</span></a>
on parsing.
I wouldn't expect HTMLParser.parse
to change the structure of my input DOM. Sounds like a core bug.
Yes, https://github.com/taoqf/node-html-parser/issues/144 related.
Hi all! I believe nested href tags are invalid HTML.
If memory serves, we handle this the standard way that other parsers and browsers do, by terminating the tag, which would be considered proper behaviour.
I will confirm tomorrow if that is correct and follow up.
@aandis Fixed in the latest v6.0.0
I am trying to use
section > .column
but i am not getting any result.HTML