hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
854 stars 174 forks source link

hello, the program crashed. #4

Closed unluckydan closed 7 years ago

unluckydan commented 8 years ago

Traceback (most recent call last): File "G:/imdbfull/bingcrawler.py", line 27, in content = urllib2.urlopen(url).read() File "F:\Anaconda2\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "F:\Anaconda2\lib\urllib2.py", line 431, in open response = self._open(req, data) File "F:\Anaconda2\lib\urllib2.py", line 449, in _open '_open', req) File "F:\Anaconda2\lib\urllib2.py", line 409, in _call_chain result = func(*args) File "F:\Anaconda2\lib\urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "F:\Anaconda2\lib\urllib2.py", line 1197, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno 10060] >

hellock commented 8 years ago

It seems that the code you posted is not from icrawler, I didn't use the urllib2 package at all. I suggest that you use the built-in bing crawler and extend it to satisfy your special requirements. ps: package requests is easier to use than urllib2.

unluckydan commented 8 years ago

thx, it is my problem ==!

but when I used it in Aliyun ECS image It wouldn't exit some task as usual as it should. The last line is: crawling task done! And then I need to terminate the program and restart it......

hellock commented 8 years ago

I tested BingImageCrawler on aliyun ECS using the codes in test.py just now. It terminated normally though many connection error occurred (seems that many sites are blocked). I cannot figure out the problem for the moment, maybe more tests are needed to be done.