rialto-php / puphpeteer

A Puppeteer bridge for PHP, supporting the entire API.
MIT License
1.34k stars 204 forks source link

Getting text and attribute from a website #150

Open marcpre opened 3 years ago

marcpre commented 3 years ago

I am using "@nesk/puphpeteer": "^2.0.0" and want get the text and the href-attribute from a link.

I tried the following:

<?php

require_once '../vendor/autoload.php';

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

$debug = true;

$puppeteer = new Puppeteer([
    'read_timeout' => 100,
    'debug' => $debug,
]);
$browser = $puppeteer->launch([
    'headless' => !$debug,
    'ignoreHTTPSErrors' => true,
]);

$page = $browser->newPage();
$page->goto('http://example.python-scraping.com/');

//get text and link
$links = $page->querySelectorXPath('//*[@id="results"]/table/tbody/tr/td/div/a', JsFunction::createWithParameters(['node'])
    ->body('return node.textContent;'));

// get single text
$singleText = $page->querySelectorXPath('//*[@id="pagination"]/a', JsFunction::createWithParameters(['node'])
    ->body('return node.textContent;'));

$browser->close();

When I run the above script I get the nodes from the page, BUT I cannot access the attributes or the text?

Any suggestions how to do this?