hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
854 stars 174 forks source link

google crawler stuck #24

Closed sizhangyu closed 7 years ago

sizhangyu commented 7 years ago

I tried to used the built-in crawler to crawl images from google. I have a list of keywords to crawl and the crawler works well with most of the keywords. I only changed the keyword for searching, and the program always got stuck at the same keyword. The last several messages I got: 2017-07-20 12:03:15,237 - INFO - downloader - no more download task for thread downloader-003 2017-07-20 12:03:15,237 - INFO - downloader - thread downloader-003 exit 2017-07-20 12:03:15,480 - INFO - downloader - no more download task for thread downloader-004 2017-07-20 12:03:15,480 - INFO - downloader - thread downloader-004 exit 2017-07-20 12:03:16,180 - INFO - downloader - no more download task for thread downloader-002 2017-07-20 12:03:16,180 - INFO - downloader - thread downloader-002 exit 2017-07-20 12:03:16,832 - INFO - downloader - no more download task for thread downloader-001 2017-07-20 12:03:16,832 - INFO - downloader - thread downloader-001 exit

While for successful crawling, I got 2017-07-20 13:13:06,827 - INFO - downloader - no more download task for thread downloader-001 2017-07-20 13:13:06,827 - INFO - downloader - thread downloader-001 exit 2017-07-20 13:13:06,966 - INFO - downloader - no more download task for thread downloader-003 2017-07-20 13:13:06,966 - INFO - downloader - thread downloader-003 exit 2017-07-20 13:13:07,104 - INFO - downloader - no more download task for thread downloader-002 2017-07-20 13:13:07,104 - INFO - downloader - thread downloader-002 exit 2017-07-20 13:13:11,862 - INFO - downloader - no more download task for thread downloader-004 2017-07-20 13:13:11,862 - INFO - downloader - thread downloader-004 exit 2017-07-20 13:13:12,439 - INFO - icrawler.crawler - Crawling task done!

hellock commented 7 years ago

Sorry for the late reply, I've been to a conference these days. Would you share the keywords for reproducing the problem?