yasserg / crawler4j

Open Source Web Crawler for Java
Apache License 2.0
4.56k stars 1.93k forks source link

Not able to download images from ecommerce websites #234

Open manzajohn opened 7 years ago

manzajohn commented 7 years ago

program is working fine for other sites but not able to download from ecommerce sites

rzo1 commented 7 years ago

check robots.txt

manzajohn commented 7 years ago

where we can find robots.txt?

Chaiavi commented 7 years ago

Google it out my friend

Every site should have that file at it's root (example.com/robots.txt )

In it it defines which urls not to crawl

On Tue, Aug 1, 2017 at 8:31 AM, manzajohn notifications@github.com wrote:

where we can find robots.txt?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yasserg/crawler4j/issues/234#issuecomment-319272263, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrbWxFgugsQ7CU7JZleZHVtYR53T3b1ks5sTrg_gaJpZM4OiD6e .

s17t commented 7 years ago

@manzajohn if it is not robots.txt issue would you link to one ecommerce site you are trying to crawl ?

manzajohn commented 7 years ago

https://www.myntra.com/,https://www.amazon.fr/,https://www.flipkart.com/....etc i tried ,in case if some websites are disallowing to crawl images , how we can enable or download images from those websites?

manzajohn commented 7 years ago

@s17t ,hi federico tolomi,can you help me on the above ?