symfony / panther

A browser testing and web crawling library for PHP and Symfony
MIT License
2.93k stars 218 forks source link

text() return an empty string #6

Open thomasage opened 6 years ago

thomasage commented 6 years ago

Hi,

when I use the crawler to filter the DOM and test the content of the title tag, I receive an empty string.

Tested with:

<?php
// tests/PanthereTest.php
declare(strict_types=1);

namespace App\Tests;

use Panthere\PanthereTestCase;

/**
 * Class PanthereTest
 * @package App\Tests
 */
class PanthereTest extends PanthereTestCase
{
    public function testSomething(): void
    {
        $client = static::createClient();
        $crawler = $client->request('GET', static::$baseUri.'/');
        $this->assertEquals('Welcome!', $crawler->filterXPath('//title')->html());
        $this->assertEquals('Welcome!', $crawler->filterXPath('//title')->text());

        $client = static::createPanthereClient();
        $crawler = $client->request('GET', static::$baseUri.'/');
        $this->assertEquals('<title>Welcome!</title>', $crawler->filterXPath('//title')->html());
        $this->assertEquals('Welcome!', $crawler->filterXPath('//title')->text());
    }
}
// composer.json
{
    "type": "project",
    "license": "proprietary",
    "require": {
        "php": "^7.1.3",
        "ext-iconv": "*",
        "symfony/console": "^4.0",
        "symfony/flex": "^1.0",
        "symfony/framework-bundle": "^4.0",
        "symfony/lts": "^4@dev",
        "symfony/yaml": "^4.0"
    },
    "require-dev": {
        "dunglas/panthere": "^1.0@dev",
        "symfony/dotenv": "^4.0",
        "symfony/phpunit-bridge": "^4.0"
    },
    "config": {
        "preferred-install": {
            "*": "dist"
        },
        "sort-packages": true
    },
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        }
    },
    "autoload-dev": {
        "psr-4": {
            "App\\Tests\\": "tests/"
        }
    },
    "replace": {
        "symfony/polyfill-iconv": "*",
        "symfony/polyfill-php71": "*",
        "symfony/polyfill-php70": "*",
        "symfony/polyfill-php56": "*"
    },
    "scripts": {
        "auto-scripts": {
            "cache:clear": "symfony-cmd",
            "assets:install --symlink --relative %PUBLIC_DIR%": "symfony-cmd"
        },
        "post-install-cmd": [
            "@auto-scripts"
        ],
        "post-update-cmd": [
            "@auto-scripts"
        ]
    },
    "conflict": {
        "symfony/symfony": "*"
    },
    "extra": {
        "symfony": {
            "id": "01C9W9DMK0BWP564TKPSF2P5F0",
            "allow-contrib": false
        }
    }
}

Result:

PHPUnit 6.5.7 by Seb
astian Bergmann and contributors.

Testing Project Test Suite
F                                                                   1 / 1 (100%)

Time: 4.76 seconds, Memory: 8.00MB

There was 1 failure:

1) App\Tests\PanthereTest::testSomething
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'Welcome!'
+''

tests/PanthereTest.php:24

FAILURES!
Tests: 1, Assertions: 4, Failures: 1.
thomasage commented 6 years ago

Same thing with Windows 10 / PHP 7.2.4 / Wampserver.

But no problem with a tag in body - like p for example. It seems occured only with head tags.

dunglas commented 6 years ago

Probably a weird behavior with Chrome...

dunglas commented 6 years ago

Ok, this is because WebDriver returns only the displayed text, and by definition anything in <head> is hidden. Here is a workaround: http://grokbase.com/t/gg/webdriver/155wx8zwjv/how-to-get-the-content-tags-that-reside-in-head-head-of-a-webpage

Maybe can we change this behavior, and return innerHtml if the tag is in <head>?

thomasage commented 6 years ago

Seems a good option.

LegendOfGIT commented 6 years ago

Imo using a different behavior for the section is too specific. I think it would be a better solution to use innerHtml as a general fallback for text(). Furthermore if a dev does not want this default behavior activated, he/she gets an option to deactivate this behavior.

dunglas commented 6 years ago

Or we can add a flag to getText, like getText(bool $includeHidden) that will be false by default.

thomasage commented 6 years ago

Or maybe 2 methods?

dinamic commented 6 years ago

IMO the current behaviour is the expected behaviour and shouldn't be changed. As long as the developer is able to get the innerHtml, it all seem good to me.

thomasage commented 6 years ago

@dinamic: I agree with you now. Maybe just add a notice in doc?

dunglas commented 6 years ago

A notice in the docs would be great. Do you want to work on this?

ssnepenthe commented 5 years ago

This may be expected behavior as far as webdriver is concerned, but it is inconsistent with goutte... I guess the question is which takes priority?

If you would like to keep it consistent, maybe we could try something like $element->getAttribute('textContent') instead of $element->getText()?

And then if you wanted to expose the webdriver behavior add a secondary method like $crawler->visibleText()?

@thomasage test case above also exposes a similar issue with the html method - panther is getting the outerHTML attribute but goutte gets what would be the equivalent of innerHTML.

BB-000 commented 2 years ago

I have read all the docs and tried everything I can think of but I cannot get the value of an element that is CSS display:none.

In fact, the $node->html() method always returns an empty string, even on non hidden content...

And the $node->outerHtml() return an error: The "getNode" method cannot be used in WebDriver mode. Use "getElement" instead

Confused...

$node->filter('.match-info')->outerHtml();  // ERROR
$node->filter('.match-info')->html();  // EMPTY
$node->filter('.match-info')->html();  // NOT EMPTY BUT HIDDEN TEXT NOT THERE
$scores = $node->filter('.match-info')->each(function (Crawler $node, $i) {
    $node->html();  // EMPTY
    $node->text();  // NOT EMPTY BUT HIDDEN TEXT NOT THERE
});
Radio-Skonto commented 5 months ago

You can crawl hidden elements by Symfony DomCrawler

Just upload html from Panther to Symfony DomCrawler

Simple example

        $browser = Client::createChromeClient();
        $browser->request('GET', 'https://www.webpage.com');

         $htmlData       = $browser->getCrawler()->html();
         $domCrawler  = new Crawler($htmlSongData);
         $carData         = $domCrawler->filter('table')->eq(1)->filter('tr')->each(
            function (Crawler $node) {
                $html = $node->outerHtml();
                if ($node->filter('td')->count() > 0) {
                    $rowTitle = $node->filter('td')->eq(0)->text();
                    $rowValue = $node->filter('td')->eq(1)->text();

                    return [
                        'rowTitle' => $rowTitle,
                        'rowValue' => $rowValue,
                    ];
                }
            });