Open manzajohn opened 7 years ago
check robots.txt
where we can find robots.txt?
Google it out my friend
Every site should have that file at it's root (example.com/robots.txt )
In it it defines which urls not to crawl
On Tue, Aug 1, 2017 at 8:31 AM, manzajohn notifications@github.com wrote:
where we can find robots.txt?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yasserg/crawler4j/issues/234#issuecomment-319272263, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrbWxFgugsQ7CU7JZleZHVtYR53T3b1ks5sTrg_gaJpZM4OiD6e .
@manzajohn if it is not robots.txt issue would you link to one ecommerce site you are trying to crawl ?
https://www.myntra.com/,https://www.amazon.fr/,https://www.flipkart.com/....etc i tried ,in case if some websites are disallowing to crawl images , how we can enable or download images from those websites?
@s17t ,hi federico tolomi,can you help me on the above ?
program is working fine for other sites but not able to download from ecommerce sites