symfony / panther

A browser testing and web crawling library for PHP and Symfony
MIT License
2.94k stars 222 forks source link

Symfony panther html() issue #490

Open vladginosyan12 opened 3 years ago

vladginosyan12 commented 3 years ago

I am using Symfony Panther for web scraping. When Google Chrome's version and Chrome driver was 89, everything worked fine. But after updating both versions to 92,

$crawler->filter('h1')->html();

will always return empty string.

I think, the problem is related to this method >html()

Could you please let me know if you have a solution for this.

LoicBoursin commented 3 years ago

Duplicate of https://github.com/symfony/panther/issues/478

jbalatero commented 3 years ago

Encountered this issue as well

MartinsPaulo commented 3 years ago

No solution?

codegain commented 2 years ago

@vladginosyan12 If this is still an issue, try:

$crawler->filter('h1')->getElement(0)->getDomProperty('innerHTML');

(for reference: https://github.com/php-webdriver/php-webdriver/discussions/921)

The html() method of the domcrawler still uses ->attr('outerHTML') which will not work if the browser is in W3C mode, as explained in #478.

AntoineMUSSARD commented 2 years ago

Hi,

If you want to avoid the bug (or the feature) you can uninstall your current Chrome and install a version prior to the 91 one. Ex for debian the google-chrome-stable_90.0.4430.93-1_amd64.deb release You can find it here : http://mirror.cs.uchicago.edu/google-chrome/pool/main/g/google-chrome-stable/ When you reinstall the packet, don't forget to refresh the bdi : vendor/bin/bdi detect drivers

If you want to correct the bug/feature with recent Chrome : Since ChromeDriver 91 it is W3C standard compliant with "Get Element Attribute". So you must disable w3c compatibility. I didn't succed in doing it with my config : Anybody know how to use ChromeOption / setExperimentalOption with a config like : $chromeOptions = new ChromeOptions(); $chromeOptions->setExperimentalOption('w3c', false);

$client = Client::createChromeClient( null, [ '--window-size=1200,1100', '--headless', '--disable-dev-shm-usage', '--no-sandbox' ], ['port' => 9000 + getmypid()] );

I don't know where to link the ChromeOptions.

Thank you. Antoine.