microformats / php-mf2

php-mf2 is a pure, generic microformats-2 parser for PHP. It makes HTML as easy to consume as JSON.
Creative Commons Zero v1.0 Universal
189 stars 38 forks source link

This HTML fragment produces an endless loop in public function parse_recursive #247

Open osthafen opened 1 year ago

osthafen commented 1 year ago

Calling parse with this HTML fragment (from the wild) produces an endless loop in public function parse_recursive:


require_once 'Parser.php';

// produces an endless loop in public function parse_recursive
$mf = Mf2\parse('<!DOCTYPE html>
<html lang="pt-PT">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <title>PAPER PRIME &#8211; A origem do seu papel</title>
    </head>

    <body>
        <div class="site" id="page-top">
            <!-- dynamic header start -->
            <div data-colibri-id="111-h1" class="page-header style-1 style-local-111-h1 position-relative">
                <!---->
                <div data-colibri-navigation-overlap="true" role="banner" class="h-navigation_outer h-navigation_overlap style-2-outer style-local-111-h2-outer">
                    <!---->
                    <div id="navigation" data-colibri-component="navigation" data-colibri-id="111-h2" class="h-section h-navigation h-navigation d-flex style-2 style-local-111-h2">
                        <!---->
                        <div class="h-section-grid-container h-section-fluid-container">
                            <div data-nav-normal="">
                                <div data-colibri-id="111-h3" class="h-row-container h-section-boxed-container gutters-row-lg-0 gutters-row-md-0 gutters-row-2 gutters-row-v-lg-0 gutters-row-v-md-0 gutters-row-v-2 style-3 style-local-111-h3 position-relative">
                                    <!---->
                                    <div class="h-row justify-content-lg-center justify-content-md-center justify-content-center align-items-lg-stretch align-items-md-stretch align-items-stretch gutters-col-lg-0 gutters-col-md-0 gutters-col-2 gutters-col-v-lg-0 gutters-col-v-md-0 gutters-col-v-2">
                                        <div class="h-column h-column-container d-flex h-col-none style-8-outer style-local-111-h10-outer">
                                            <div data-colibri-id="111-h10" data-placeholder-provider="navigation-menu" class="d-flex h-flex-basis h-column__inner h-px-lg-0 h-px-md-0 h-px-0 v-inner-lg-0 v-inner-md-0 v-inner-0 style-8 style-local-111-h10 position-relative">
                                                <!---->
                                                <div class="w-100 h-y-container h-column__content h-column__v-align flex-basis-auto align-self-lg-center align-self-md-center align-self-center">
                                                    <!---->
                                                    <div data-colibri-component="dropdown-menu" role="navigation" h-use-smooth-scroll-all="true" data-colibri-id="111-h11" class="h-menu h-global-transition-all h-ignore-global-body-typography has-offcanvas-tablet h-menu-horizontal h-dropdown-menu style-9 style-local-111-h11 position-relative h-element">
                                                        <div data-colibri-id="111-h12" class="h-mobile-menu h-global-transition-disable style-10 style-local-111-h12 position-relative h-element">
                                                            <!---->
                                                            <div id="offcanvas-wrapper-111-h12" class="h-offcanvas-panel offcanvas offcanvas-right hide force-hide style-10-offscreen style-local-111-h12-offscreen">
                                                                <div data-colibri-id="111-h13" class="d-flex flex-column h-offscreen-panel style-11 style-local-111-h13 position-relative h-element">
                                                                    <!---->
                                                                    <div class="offscreen-header h-ui-empty-state-container">
                                                                        <div data-colibri-id="111-h15" class="h-row-container gutters-row-lg-0 gutters-row-md-0 gutters-row-0 gutters-row-v-lg-0 gutters-row-v-md-0 gutters-row-v-0 style-12 style-local-111-h15 position-relative">
                                                                            <!---->
                                                                            <div class="h-row justify-content-lg-center justify-content-md-center justify-content-center align-items-lg-stretch align-items-md-stretch align-items-stretch gutters-col-lg-0 gutters-col-md-0 gutters-col-0 gutters-col-v-lg-0 gutters-col-v-md-0 gutters-col-v-0">
                                                                                <!---->
                                                                                <div class="h-column h-column-container d-flex h-col-none style-13-outer style-local-111-h16-outer">
                                                                                    <div data-colibri-id="111-h16" class="d-flex h-flex-basis h-column__inner h-px-lg-2 h-px-md-2 h-px-2 v-inner-lg-2 v-inner-md-2 v-inner-2 style-13 style-local-111-h16 position-relative">
                                                                                    </div>
                                                                                </div>
                                                                            </div>
                                                                        </div>
                                                                    </div>
                                                                </div>
                                                            </div>
                                                        </div>
                                                    </div>
                                                </div>
                                            </div>
                                        </div>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
            <!-- dynamic header end -->
        </div><!-- #page -->
    </body>
</html>', 'https://www.paperprime.pt/');

var_export($mf);
JKingweb commented 1 year ago

For what it's worth the loop is not in fact endless: it's just really, really slow, taking 1:49.650 to complete on my computer. It seems php-mf2 has poor performance the deeper things are nested.