hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
854 stars 174 forks source link

Problem with GoogleImageCrawler #117

Open drchristophreuter opened 1 year ago

drchristophreuter commented 1 year ago

When searching for images using GoogleImageCrawler, I always get an error message even though I have limited the search to 100 images. Does anyone have a solution for this problem?

Error message: 2023-10-24 15:36:37,554 - INFO - icrawler.crawler - start crawling... 2023-10-24 15:36:37,557 - INFO - icrawler.crawler - starting 1 feeder threads... 2023-10-24 15:36:37,561 - INFO - feeder - thread feeder-001 exit 2023-10-24 15:36:37,561 - INFO - icrawler.crawler - starting 1 parser threads... 2023-10-24 15:36:37,569 - INFO - icrawler.crawler - starting 4 downloader threads... 2023-10-24 15:36:38,158 - INFO - parser - parsing result page https://www.google.com/search?q=cat&ijn=0&start=0&tbs=isz%3Al%2Cic%3Aspecific%2Cisc%3Aorange%2Csur%3Afmc%2Ccdr%3A1%2Ccd_min%3A01%2F01%2F2017%2Ccd_max%3A11%2F30%2F2017&tbm=isch Exception in thread parser-001: Traceback (most recent call last): File "/home/reuter/anaconda3/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/home/reuter/anaconda3/lib/python3.11/threading.py", line 975, in run self._target(*self._args, self._kwargs) File "/home/reuter/anaconda3/lib/python3.11/site-packages/icrawler/parser.py", line 94, in worker_exec for task in self.parse(response, kwargs): TypeError: 'NoneType' object is not iterable 2023-10-24 15:36:42,574 - INFO - downloader - no more download task for thread downloader-001 2023-10-24 15:36:42,575 - INFO - downloader - no more download task for thread downloader-004 2023-10-24 15:36:42,577 - INFO - downloader - thread downloader-004 exit 2023-10-24 15:36:42,576 - INFO - downloader - thread downloader-001 exit 2023-10-24 15:36:42,576 - INFO - downloader - no more download task for thread downloader-003 2023-10-24 15:36:42,582 - INFO - downloader - thread downloader-003 exit 2023-10-24 15:36:42,575 - INFO - downloader - no more download task for thread downloader-002 2023-10-24 15:36:42,584 - INFO - downloader - thread downloader-002 exit 2023-10-24 15:36:43,576 - INFO - icrawler.crawler - Crawling task done!

runfile('/home/reuter/untitled0.py', wdir='/home/reuter') 2023-10-24 15:37:25,544 - INFO - icrawler.crawler - start crawling... 2023-10-24 15:37:25,545 - INFO - icrawler.crawler - starting 1 feeder threads... 2023-10-24 15:37:25,546 - INFO - feeder - thread feeder-001 exit 2023-10-24 15:37:25,546 - INFO - icrawler.crawler - starting 1 parser threads... 2023-10-24 15:37:25,554 - INFO - icrawler.crawler - starting 4 downloader threads... 2023-10-24 15:37:26,032 - INFO - parser - parsing result page https://www.google.com/search?q=cat&ijn=0&start=0&tbs=isz%3Al%2Cic%3Aspecific%2Cisc%3Aorange%2Csur%3Afmc%2Ccdr%3A1%2Ccd_min%3A01%2F01%2F2022%2Ccd_max%3A11%2F30%2F2022&tbm=isch Exception in thread parser-001: Traceback (most recent call last): File "/home/reuter/anaconda3/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/home/reuter/anaconda3/lib/python3.11/threading.py", line 975, in run self._target(*self._args, self.kwargs) File "/home/reuter/anaconda3/lib/python3.11/site-packages/icrawler/parser.py", line 94, in worker_exec for task in self.parse(response, kwargs): TypeError: 'NoneType' object is not iterable 2023-10-24 15:37:30,562 - INFO - downloader - no more download task for thread downloader-001 2023-10-24 15:37:30,563 - INFO - downloader - thread downloader-001 exit 2023-10-24 15:37:30,568 - INFO - downloader - no more download task for thread downloader-002 2023-10-24 15:37:30,571 - INFO - downloader - thread downloader-002 exit 2023-10-24 15:37:30,575 - INFO - downloader - no more download task for thread downloader-003 2023-10-24 15:37:30,576 - INFO - downloader - thread downloader-003 exit 2023-10-24 15:37:30,576 - INFO - downloader - no more download task for thread downloader-004 2023-10-24 15:37:30,579 - INFO - downloader - thread downloader-004 exit 2023-10-24 15:37:31,576 - INFO - icrawler.crawler - Crawling task done!

ran5omware commented 1 year ago

have same problem for any count of images, how to fix it?

Neptune-Trojans commented 11 months ago

Same problem here.

bretdavi commented 11 months ago

Some more info over on this Issue: https://github.com/hellock/icrawler/issues/107

ZhiyuanChen commented 5 months ago

Please let me know if 0.6.8 fixes this issue~