taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

querySelectorAll partial results #283

Closed eltomjan closed 2 weeks ago

eltomjan commented 2 weeks ago

I am not able to get all querySelectorAll items. If I save txt and do similar in browser's DevTools, having way more for example for 1st 2 links: document.querySelectorAll(".product-layer-content"); NodeList(120) ... document.querySelectorAll(".product-layer-content"); NodeList(118) ... Strange is following pageDOM.querySelectorAll(".product-actions") get right number, but parentNode seems to be body ?? Anyway W3 validator shows mismatched elements :-(

                <form action="https://sortiment.makro.cz/cs/cart/add/" method="get" class="multiprodsubmitform">
                    <div class="mo-products mo-products-grid">
                        <div class="product product-incart" id="ttcontainer269017">
                            <div class="product-layer-content">
                                <div class="product-actions"> <a href="#" id="sortiment_269017"
const parseOpts = {
    lowerCaseTagName: true, // convert tag name to lower case (hurts performance heavily)
    comment: false, // retrieve comments (hurts performance slightly)
    blockTextElements: {
        script: true,
        noscript: false,
        style: true,
        pre: false,
    },
};
        const catRaw = await fetch(Url);
        let txt = await catRaw.text();
        let pageDOM = parser.parse(txt.substring(txt.indexOf("<html")), parseOpts);

        let items = pageDOM.querySelectorAll(".product-layer-content");

        console.log(`Processing ${items.length} items.`);

https://sortiment.makro.cz/cs/cerstve-chlazene/7068c/?inactionforce=1&p=1&view_price=s Processing 50 items. https://sortiment.makro.cz/cs/cerstve-chlazene/7068c/?inactionforce=1&p=2&view_price=s Processing 30 items.