jae-jae / QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
https://querylist.cc
2.65k stars 441 forks source link

递归采集有bug #44

Closed naruone closed 5 years ago

naruone commented 5 years ago

https://doc.querylist.cc/site/index/doc/49

<?php
require 'QueryList/vendor/autoload.php';
use QL\QueryList;
//获取每个li里面的h3标签内容,和class为item的元素内容
$html =<<<STR
    <div id="demo">
        <ul>
            <li>
              <h3>xxx</h3>
              <div class="list">
                <div class="item">item1</div>
                <div class="item">item2</div>
              </div>
            </li>
             <li>
              <h3>xxx2</h3>
              <div class="list">
                <div class="item">item12</div>
                <div class="item">item22</div>
              </div>
            </li>
        </ul>
    </div>
STR;
$data = QueryList::html($html)->rules(array(
        'title' => array('h3','text'),
        'list' => array('.list','html')
    ))->range('#demo li')->query()->getData(function($item){
        $item['list'] = QueryList::html($item['list'])->rules(array(
                 'item' => array('.item','text')
            ))->query()->getData()->all();
        return $item;
});
print_r($data);
/**
 结果:
 Array
(
    [0] => Array
        (
            [title] => xxx
            [list] => Array
                (
                    [0] => Array
                        (
                            [item] => item1
                        )
                    [1] => Array
                        (
                            [item] => item2
                        )
                )
        )
    [1] => Array
        (
            [title] => xxx2
            [list] => Array
                (
                    [0] => Array
                        (
                            [item] => item12
                        )
                    [1] => Array
                        (
                            [item] => item22
                        )
                )
        )
)
 */

此页上的例子,执行结果 不对,正常结果是这样:

Array
(
    [0] => Array
        (
            [title] => xxx
            [list] => Array
                (
                )

        )

    [1] => Array
        (
            [title] => xxx2
            [list] => Array
                (
                )

        )

)
naruone commented 5 years ago

找到问题了, 第一次设置的range 在递归里面必须重新设置