hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
857 stars 174 forks source link

unresolved import 'six.moves' #90

Closed Ninjanaut closed 4 years ago

Ninjanaut commented 4 years ago

I'am not able to run following code

from icrawler.builtin import GoogleImageCrawler
google_crawler = GoogleImageCrawler(storage={'C:\images': 'cat'})
google_crawler.crawl(keyword='cat', max_num=100)

it throws the 'backend' error exception, I don't know what that mean's but when I look into GoogleImageCrawler class, I can see that there is an underlined line in IDE from six.moves.urllib.parse import urlencode with unresolved import 'six.moves' error.

I think the 'backend' error and the 'six.moves' error can be related.

I have Python 3.7 (64-bit) with installed two packages icrawler and six.

Any idea how to solve this issue ?

ZhiyuanChen commented 4 years ago

The issue is that you did not specify the correct storage path

The correct path should be something like:

from icrawler.builtin import GoogleImageCrawler
google_crawler = GoogleImageCrawler(storage={'root_dir': 'C:/images/cat'})
google_crawler.crawl(keyword='cat', max_num=100)

However, I do think we should catch the Exception you mentioned and throw correct one.

Ninjanaut commented 4 years ago

Thank you! Silly mistake from my side :-) GoogleImageCrawler still returns some parsing error, but BingImageCrawler is working fine.

ZhiyuanChen commented 4 years ago

Thank you! Silly mistake from my side :-) GoogleImageCrawler still returns some parsing error

Have you checked out #84?

Ninjanaut commented 4 years ago

I replaced the #154 line in google.py

txt = re.sub(r"^AF_initDataCallback\({.*key: 'ds:(\d)'.+data:function\(\){return (.+)}}\);?$",
              "\\2", txt, 0, re.DOTALL)

with

txt = re.sub(r"^AF_initDataCallback\({.*key: 'ds:(\d)'.+data:(.+), sideChannel: {.*}}\);?$",
                  "\\2", txt, 0, re.DOTALL)

and now the GoogleImageCrawler work's fine too :-)

Thank you