yasserg / crawler4j

Open Source Web Crawler for Java
Apache License 2.0
4.53k stars 1.93k forks source link

Failed to respond messages #188

Open pekerayhan opened 7 years ago

pekerayhan commented 7 years ago

The url is http://people.com. And it seems like the site is working fine.

2017-01-18 14:18:21,136 WARN [Crawler 1] e.u.i.c.c.WebCrawler [:412] Unhandled exception while fetching http://people.com/: people.com:80 failed to respond 2017-01-18 14:18:21,140 INFO [Crawler 1] e.u.i.c.c.WebCrawler [:357] Stacktrace: org.apache.http.NoHttpResponseException: people.com:80 failed to respond

manojchandar commented 7 years ago

Team may i know which is the latest version 4.1 or 4.2. Web site have mentioned 4.1 as latest but 4.2 seems listed above 4.1 hyper link. Link for reference [release page] . In Maven listing , 4.2 listed at top Maven Central. Please help me in choosing stable version.

s17t commented 7 years ago

The latest versione is 4.2. I suspect @yasserg has uploaded 4.2 without marking release on github.

manojchandar commented 7 years ago

@s17t Thank you !

JCotton1123 commented 7 years ago

@pekerayhan there a number of sites that will block requests from "unknown" user-agents.

pekerayhan commented 7 years ago

@JCotton1123 Thanks for the response. I have been setting the user agent as "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.53 Safari/525.19"