jae-jae / QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
https://querylist.cc
2.65k stars 441 forks source link

有些页面无法采集/匹配到数据和内容 #54

Closed suhanyujie closed 5 years ago

suhanyujie commented 5 years ago
$url = 'http://www.badmintoncn.com/';
$range = '.main .list-1 .list-1-top';
$rules = [
    'title' => ['a', 'text'],
];
$rt = QueryList::get($url)->rules($rules)
    ->range($range)->query()->getData();
var_dump($url,$rt->all());exit(PHP_EOL.'下午6:30'.PHP_EOL);   
suhanyujie commented 5 years ago

看了一下,我匹配不出的应该是js渲染出来的

suhanyujie commented 5 years ago
$url = 'http://www.badmintoncn.com/';
$range = '.left-box-1 list-2';
$rules = [
    'title' => ['a', 'text'],
];
$rt = QueryList::get($url)->rules($rules)
    ->range($range)->query()->getData();
var_dump($url,$rt->all());
lirko commented 5 years ago

$url = 'http://www.badmintoncn.com/'; $range = ''; $rules = [ 'title' => ['.left-box-1 .list-2 a', 'text'], ]; $rt = QueryList::get($url)->rules($rules)->range($range)->encoding('UTF-8')->removeHead()->query()->getData()->all(); var_dump($rt);