Closed psquared-dev closed 1 year ago
Hi,
If you replace querySelectorAll
by getElementsByTagName
, you'll be able to get the href
by using .getAttribute("href")
.
const root = parse(html);
const links = root.querySelectorAll("a");
for (const a of links) {
console.log(a.rawAttrs);
console.log(a.getAttribute("href"));
}
I have same issue.
I tried to extract the <code>
contained in the <pre>
as follows, but what I got back was an empty list.
No matter how I look at it, I am not getting the <code>
.
<pre>
<code>test</code>
</pre>
test code
// test
const root = parse(data.content);
const pre_list = root.getElementsByTagName("pre");
pre_list.map((pre) => {
console.log("pre:"+pre);
});
const pre_code = root.getElementsByTagName("pre code");
console.log("pre_code:"+pre_code);
pre_list.map((pre) => {
const code = pre.getElementsByTagName("code");
console.log("code:"+code);
});
result
// pre_list
// 1st <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-typescript">
// omission
// 2nd <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-typescript"> public async getBlogs(queries?: MicroCMSQueries) {
// omission
// 3rd <pre>
BlogPreview.tsx:33 pre:<pre><code>---
// omission
// 4th <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-typescript">import { Cache, CacheContainer } from "node-ts-cache";
// omission
// 5th <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-json">{
// omission
// pre_code
BlogPreview.tsx:36 pre_code:
// code
5BlogPreview.tsx:39 code:
@excelsior091224 I'm afraid this is another issue. in you case ,you should just add an options to parse
const root = parse(html, {
blockTextElements: {
script: true,
noscript: true,
style: true,
}
});
@taoqf , this commit (release v6.1.7 onwards) breaks the earlier functionality of ignoring text content of specific tags by setting them as false
in blockTextElements
, which seems unintended to me.
v6.1.6
):
const htmlString = "sample <b><strong>text</strong> inside tags</b> <script>text inside script</script>"
console.log(parse(htmlString, { blockTextElements: { script: false } }).text) // Output: sample text inside tags
console.log(parse(htmlString, { blockTextElements: { script: true } }).text) // Output: sample text inside tags text inside script
This matches the behavior explained in the [README](https://github.com/taoqf/node-html-parser#parsedata-options) as well.
- This is the behavior after this commit (running `v6.1.7-v6.1.9`):
const htmlString = "sample text inside tags "
console.log(parse(htmlString, { blockTextElements: { script: false } }).text) // Output: sample text inside tags text inside script
console.log(parse(htmlString, { blockTextElements: { script: true } }).text) // Output: sample text inside tags text inside script
Could you please check ?
@devansh-sharma-tw Sorry for that. You can try v6.1.0 now. @excelsior091224 For your case, you should not pass and empty object as blockTextElement in option. like this:
const html = `<pre>
<code>test</code>
</pre>`;
const root = parse(html, {
blockTextElements: {
}
});
const list = root.getElementsByTagName("code");
const [code] = list;
code.text.should.eql('test');
@taoqf Thanks for the fix!
Here is the code:
a.rawAttrs
returns'href="/" rel="home"'
buta.getAttribute("href')
returnsundefined
.Also
a.attrs
always returns an empty object{}
.