jae-jae / QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
https://querylist.cc
2.65k stars 441 forks source link

版本3采集失败 #60

Closed getwebtools closed 5 years ago

getwebtools commented 5 years ago

采集百度搜索头条的页面,无法采集到结果 版本:3.2.1 PHP代码: $html = file_get_contents('http://top.baidu.com/buzz?b=1'); $html = iconv('GB2312', 'UTF-8', $html); $rules = array( // 'text' => array('.list-title', 'text'), 'text' => array('div', 'text'), ); $data = QueryList::Query($html, $rules)->data; print_r($data); echo $html;

估计4版本的可以采集,在线测试没有问题: https://www.querylist.cc/querylist-test/?data={"type":"simple","url":"http://top.baidu.com/buzz?b=1","data":{"url":"http://top.baidu.com/buzz?b=1","rule":".list-title","attr":"text"}}

jae-jae commented 5 years ago

版本3已不再维护