hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
857 stars 174 forks source link

use proxy to download #1

Closed yysijie closed 8 years ago

yysijie commented 8 years ago

I hope icrawler can support proxy service with flexible strategies as soon as possible. I think it's useful in many applications.

By the way, icrawler is really an excellent image crawler framework with multi-thread and higher extendibility. And the code has been well commented.

This framework greatly reduce my workload. Thx a lot.

hellock commented 8 years ago

Of course this is useful, the support for custom proxies and proxy pool is on the go. As soon as the branch proxy-pool is ok, I will merge it into the master branch. Glad that icrawler can help you and welcome for PRs.

hellock commented 8 years ago

You can use proxies to crawl pages now. Two classes are added: ProxyPool and ProxyScanner.

yysijie commented 8 years ago

Thank you. Now I can play some dirty tricks.