wasinger / htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler
MIT License
346 stars 50 forks source link

Updates to nodes that are nested more than one level deep aren't reflected when outputting the root or parents node's html #33

Open jacobalvarez opened 5 years ago

jacobalvarez commented 5 years ago

Please review the following test code. Am I doing something wrong here?

<?php
use Wa72\HtmlPageDom\HtmlPageCrawler;
require_once realpath($_SERVER['DOCUMENT_ROOT']) . '/support_files/external/composer_packages/vendor/autoload.php';
?>

<!DOCTYPE html>
<html>
<body>
<h2>This works</h2>
<?php
    $rootNode1 = HtmlPageCrawler::create('<div />');
    $testNode1 = HtmlPageCrawler::create('<p />');
    $rootNode1->append($testNode1);

    // Change test node text after node appended
    $testNode1->text('correct text');

    // Output root node html. Correct
    echo $rootNode1;

    // Output test node html. Correct
    echo $testNode1;
?>

<h2>This doesn't work when <code>$span</code> is nested deeper than one level?</h2>
<?php
    $rootNode2 = HtmlPageCrawler::create('<div />');
    $p = HtmlPageCrawler::create('<p />');
    $testNode2 = HtmlPageCrawler::create('<span />')->text('incorrect text');
    $p->append($testNode2);
    $rootNode2->append($p);

    // Change test node text after node appended
    $testNode2->text('correct text');

    // Output root or parent node html. Incorrect
    echo $rootNode2;
    echo $p;

    // Output node html. Correct
    echo $testNode2;
?>
</body>
</html>
wasinger commented 5 years ago

In general, you should not expect that you can modify a node after appending it to another this way.

That's because DOMDocument::importNode() which is used by append() internally always makes copies of the node objects. The Crawler object that is passed to append() gets updated with the cloned nodes (that's why it works in your first example) but any children of the nodes in the Crawler object are clones, too, and therefore not connected to Crawler objects that contained their originals any more.

In your example: At line $rootNode1->append($testNode1); the p node inside $testNode1 is cloned when being appended to $rootNode1 but the $testNode1 object is updated to contain the cloned p node, so you can modify it afterwards. In your second example, when calling $rootNode2->append($p); the $p variable will be updated with the cloned p node but the span child of it is not connected to the $testNode2 variable any more.

If someone has a suggestion how to fix this please contribute...