hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
857 stars 174 forks source link

Parser Error On Running Google Image Crawler #35

Closed ZYZhang2016 closed 7 years ago

ZYZhang2016 commented 7 years ago

Hi, thanks for the great project. So far I crawled baidu image and bing image smoothly,but I have 2 issues: 1. I am using python3.6 on MacOS Sierra. Google image crawler passed the test, but it didn't create a folder and didn't download the images. When I run the example of google image crawler, the log reads : " ERROR - parser - Exception caught when fetching page https://www.google.com/search?q=sunny&ijn=1&start=100&tbs=cdr%3A1%2Ccd_min%3A%2Ccd_max%3A%2Csur%3A&tbm=isch&lr=, error: HTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: /search?q=sunny&ijn=1&start=100&tbs=cdr%3A1%2Ccd_min%3A%2Ccd_max%3A%2Csur%3A&tbm=isch&lr= (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x104856278>: Failed to establish a new connection: [Errno 65] No route to host',)), remaining retry times: 6 " The baidu image crawler and bing image crawler works well on my enviroment. 2. Is it possible to download more than 1000 images?

Any idea could be helpful. Thanks!

hellock commented 7 years ago

Hi @ZYZhang2016 , for the error report, I will test on the same environment, but it is more likely to be a random network issue. For question 2, you can specify different time ranges to get more than 1000 images, see this answer on stackoverflow for details.

ZYZhang2016 commented 7 years ago

Really thank you for your reply. It's actually a network issue, after using a new shadowsocket server (I am in thne mainland China) the code worked well.

Bluearrow commented 5 years ago

Really thank you for your reply. It's actually a network issue, after using a new shadowsocket server (I am in thne mainland China) the code worked well.

@ZYZhang2016 Hi, how did you use shadowsocket server to make GoogleImageCrawler work in mainland China?