voku / simple_html_dom

📜 Modern Simple HTML DOM Parser for PHP
MIT License
850 stars 116 forks source link

escaping issue #112

Open Pijushgupta opened 1 week ago

Pijushgupta commented 1 week ago

Php: 7.4.3 tested upto 8.3.8 simple_html_dom : 4.8.7

Before simple_html_dom

<div class="woocommerce-variation single_variation">
    <div class="woocommerce-variation-description"></div>
    <div class="woocommerce-variation-price"></div>
    <div class="woocommerce-variation-availability"><p class="stock in-stock">30 in stock</p></div>
</div>

After simple_html_dom

<div class="woocommerce-variation single_variation">
    <div class="woocommerce-variation-description">&lt;\/div&gt;
    <div class="woocommerce-variation-price">&lt;\/div&gt;
    <div class="woocommerce-variation-availability"><p class="stock in-stock">30 in stock</p>
&lt;\/div&gt;
</div></div></div></div>
        /**
         * loading the dom using voku\helper\HtmlDomParser
         * see more: https://github.com/voku/simple_html_dom
         */
        self::$dom = HtmlDomParser::str_get_html($content);

        /**
         * handling img tags
         */
        //self::handleImg();

        /**
         * for inline background images.
         */
        //self::handleImgBG();

        return self::$dom;
voku commented 1 week ago

Looks like broken html in the original html, can you fix it, or do you need to use the broken html?

Pijushgupta commented 1 week ago

Looks like broken html in the original html, can you fix it, or do you need to use the broken html?

That is the snippet. Updated the comment to give more context. problem is valid </div> getting replaced with <\/div>. Specific to woocommerce product variant description. Rest of the things are okay. Meaning its getting treated as text instead of html.

voku commented 1 week ago

The html is still broken: missing end-div after . woocommerce-variation-availability

Pijushgupta commented 1 week ago

Sorry for incomplete snippet(updated again) and thanks @voku for the library. May be the issue from our end or from woocommerce generating abnormal HTML hierarchy .