jae-jae / QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
https://querylist.cc
2.65k stars 441 forks source link

你好,这个常驻内存有内存泄漏问题啊 #53

Closed simayubo closed 3 years ago

suhanyujie commented 5 years ago

自己在合适的位置,释放一下就行了

AlexKitC commented 5 years ago

cli下自己释放了自己业务产生的变量,但循环执行一些任务的时候,依然会存在内存一直飙升的问题,看了下源码,没发现是哪儿的问题

suhanyujie commented 5 years ago
AlexKitC commented 5 years ago

for($i=188;$i<1800;$i++){ $tmpp = $i100; $str = "$tmpp,100"; $res = $db -> query([""],"",$str,""); foreach($res as $v){ $data = []; $tmp = QueryList::get($v['url'])->encoding('UTF-8','GB2312'); $data['location'] = $tmp->find("#Label18")->text(); $data['num'] = $tmp ->find("#Label16")->text(); $data['date'] = $tmp ->find("#Label24")->text(); $data['price'] = $tmp ->find("#Label23")->text(); $data['company_b'] = $tmp ->find("#Label19")->text(); $data['company_a'] = $tmp ->find("#Label21")->text(); $data['type'] = $tmp ->find("#Label17")->text(); $data['area'] = $tmp ->find("#Label22")->text(); // $db-> update($data,'id='.$v['id']); echo $v['id']."\r\n"; $tmp = null; $data = null; $v = null; } $tmpp = null; $str = null; $res = null; echo 'page: '.$i."\r\n"; }

如上:我注释掉所有业务相关代码; 仅保留循环内的: $tmp = QueryList::get($v['url'])->encoding('UTF-8','GB2312'); 和: $tmp = null; 这两句,打开任务管理器,依然看到cli的php进程没秒2M左右的不停增加,家里的电脑,在350M的时候会触发GC;公司的电脑,会一直+到1.5G内存耗费,然后退出cli,你可以随便找个翻页的网址测试下哈,不加任何业务代码,单纯的QueryList::get,然后unset掉,也没用,现在只能依靠系统的强制GC。

suhanyujie commented 5 years ago
public function test1()
{
    $gUrl = 'http://www.badmintoncn.com/';
    for($i=188;$i<1800;$i++){
        $tmpp = ($i-1)*100;
        $str = "$tmpp,100";
//            $res = $db -> query([""],"",$str,"");
        $res = [
            'id'=>1,
            'url'=>''
        ];
        for($i=0;$i<100;$i++){
            $data = [];
            $tmp = QueryList::get($gUrl)->encoding('UTF-8','GB2312');
            $data['location'] = $tmp->find("#Label18")->text();
            $data['num'] = $tmp ->find("#Label16")->text();
            $data['date'] = $tmp ->find("#Label24")->text();
            $data['price'] = $tmp ->find("#Label23")->text();
            $data['company_b'] = $tmp ->find("#Label19")->text();
            $data['company_a'] = $tmp ->find("#Label21")->text();
            $data['type'] = $tmp ->find("#Label17")->text();
            $data['area'] = $tmp ->find("#Label22")->text();
            $tmp = null;
            $data = null;
        }
        $size = memory_get_usage();
        echo $this->convert($size).PHP_EOL;
        sleep(1);
        $tmpp = null;
        $str = null;
        $res = null;
        echo 'page: '.$i."\r\n";
    }
}

public function convert($size)
{
    $unit=array('b','kb','mb','gb','tb','pb');
    return @round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i];
}
AlexKitC commented 5 years ago

把这个问题解决了,搭配其他的任务投递方案,无人值守会更舒服,看好你这个框架哦,加油。持续跟进这个问题。

simayubo commented 5 years ago

释放:$query->destruct(); 如果是swoole:当work处理到了某个数量的时候重启work进程

jae-jae commented 5 years ago

使用完QueryList对象后要及时调用 destruct() 方法释放内存占用。

slyfalcon commented 5 years ago

全部是动态代理闭包未释放内存。

jae-jae commented 3 years ago

已优化内存占用