wasinger / htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler
MIT License
346 stars 50 forks source link

remove() doesn't work on root elements in an html part #25

Open Defcon0 opened 7 years ago

Defcon0 commented 7 years ago

When doing the following, the div isn't removed from the html:

$objNode        = new HtmlPageCrawler('<div class="test"><p>Hallo</p></div>');
$objNode->filter('.test')->remove();

$x = $objNode->saveHTML();

When doing $objNode->filter('p')->remove(); the p-elements are correctly removed. Also when wrapping the div in another div it also works.

So it seems that I cannot remove root elements, can I? At least a hint in the comment would've been nice ;-) Maybe the bug can be fixed.

Qclanton commented 6 years ago

I faced the same problem and simply wrapped my html into additional wrapper to work around it. Example:

$content = "<div class="test"><p>Hallo</p></div>";
$objNode = new HtmlPageCrawler("<div id='crawled-root'>{$content}</div>");
$objNode->filter(".test")->remove();

$x = $objNode->filter("#crawled-root")->saveHTML();
glensc commented 6 years ago

the same problem with "replacewith" call:

<?php

use Wa72\HtmlPageDom\HtmlPageCrawler;

require __DIR__ . '/vendor/autoload.php';

$html = '<img width="50" height="50" src="about:blank">';
$crawler = new HtmlPageCrawler($html);

$crawler->filter('img')
    ->each(function (HtmlPageCrawler $node) {

        $node->replaceWith('<!-- Picture -->');
        $node->attr('test', 1);
    });

$text = (string)$crawler;
dump($text);
$ ./replace.php
"<img width="50" height="50" src="about:blank" test="1">"
aust6512 commented 5 years ago

Looking at the code the remove() function uses a call to parentNode->removeChild($node). I briefly looked for a way to directly remove a node, but I'm not sure there is a way within php Dom besides removeChild(). I just wrapped my whole document in a

to start and it does solve the problem.