jae-jae / QueryList-Puppeteer

QueryList Plugin: Use Puppeteer to crawl Javascript dynamically rendered pages.(Headless Chrome ) 使用Puppeteer采集JavaScript动态渲染的页面
21 stars 8 forks source link

超时时间设置没有作用 #2

Closed wyq2214368 closed 4 years ago

wyq2214368 commented 4 years ago

设置idle_timeout、timeout参数貌似都没有生效

$ql->chrome(function ($page,$browser) {...}, [
    'idle_timeout' => 0,
    'timeout' => 0,
])

报错如下:

The idle timeout (60.000 seconds) has been exceeded. Maybe you should increase the "idle_timeout" option.

(值也尝试过不用0,设置成1000000也无效,到时间就推出了)

wyq2214368 commented 4 years ago

经尝试,问题已解决。配置项在options这里设置本身就不生效,需要在实例化puppteer对象时传递给构造函数。

个人处理过程如下,供同学们参考

尝试1

首先根据nesk/puphpeteer的readme介绍中提到This will create a new Node process controlled by PHP.,是创建一个由php控制的node进程,那么初步怀疑是否存在socket通信超时问题,于是到php.ini文件中找到

; Default timeout for socket based streams (seconds)
; http://php.net/default-socket-timeout
default_socket_timeout = 60

修改时间,无效, PASS!

尝试2

不再使用quryList,直接使用nesk/puphpeteer试试idle_timeout参数是否生效

use Nesk\Puphpeteer\Puppeteer;

$puppeteer = new Puppeteer;
$browser = $puppeteer->launch([
    'idle_timeout' => 1000000, //超时时间,因为不确定是秒还是毫秒,所以设置的大了点
    'headless' => false
]);
$page = $browser->newPage();
$page->goto('https://example.com');

sleep(100); // 等100秒

$browser->close();

结果,也无效,PASS!

尝试3

在实例化Puppeteer时向构造函数传递idle_timeout设置项,而不再通过launch设置。

use Nesk\Puphpeteer\Puppeteer;

$puppeteer = new Puppeteer([
    'idle_timeout' => 1000000 //超时时间,因为不确定是秒还是毫秒,所以设置的大了点
]);
$browser = $puppeteer->launch([
    'headless' => false
]);
$page = $browser->newPage();
$page->goto('https://example.com');

sleep(100); // 等100秒

$browser->close();

OK,生效

done

因此,我在依赖包文件 jaeger/querylist-puppeteer/Chrome.php 中修改了超时时间(当然不推荐这样做,不过我图快且非正式项目,无所谓。作者大大可以考虑下对不对,如果确实如此可以考虑修复下或者我提个pr,如果我说的不对请指正)

public static function render(QueryList $queryList,$url,$options)
 {
        $options = self::mergeOptions($options);
        $puppeteer = new Puppeteer(['idle_timeout' => 3600]);
        ...
}