Open Defcon0 opened 7 years ago
I faced the same problem and simply wrapped my html into additional wrapper to work around it. Example:
$content = "<div class="test"><p>Hallo</p></div>";
$objNode = new HtmlPageCrawler("<div id='crawled-root'>{$content}</div>");
$objNode->filter(".test")->remove();
$x = $objNode->filter("#crawled-root")->saveHTML();
the same problem with "replacewith" call:
<?php
use Wa72\HtmlPageDom\HtmlPageCrawler;
require __DIR__ . '/vendor/autoload.php';
$html = '<img width="50" height="50" src="about:blank">';
$crawler = new HtmlPageCrawler($html);
$crawler->filter('img')
->each(function (HtmlPageCrawler $node) {
$node->replaceWith('<!-- Picture -->');
$node->attr('test', 1);
});
$text = (string)$crawler;
dump($text);
$ ./replace.php
"<img width="50" height="50" src="about:blank" test="1">"
Looking at the code the remove() function uses a call to parentNode->removeChild($node). I briefly looked for a way to directly remove a node, but I'm not sure there is a way within php Dom besides removeChild(). I just wrapped my whole document in a
to start and it does solve the problem.
When doing the following, the div isn't removed from the html:
When doing
$objNode->filter('p')->remove();
the p-elements are correctly removed. Also when wrapping the div in another div it also works.So it seems that I cannot remove root elements, can I? At least a hint in the comment would've been nice ;-) Maybe the bug can be fixed.