port of robots.txt - Githubissues

xrma / crawler4j

Automatically exported from code.google.com/p/crawler4j

0 stars 0 forks source link

port of robots.txt #127

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.crawl http://localhost:9000/ with crawler4j.
2.check outputs like "ERROR [Thread-2] Fatal transport error: Connection to 
http://localhost refused while fetching http://localhost/robots.txt (link found 
in doc #0)".

What is the expected output? What do you see instead?
crawler4j fetches "http://localhost:9000/robots.txt".

What version of the product are you using?
3.3

Please provide any additional information below.

Original issue reported on code.google.com by pikote...@gmail.com on 25 Feb 2012 at 8:00

GoogleCodeExporter commented 9 years ago

please use if it help you.

Original comment by pikote...@gmail.com on 26 Feb 2012 at 4:15

Attachments:

0001-port-of-robots.txt.patch

GoogleCodeExporter commented 9 years ago

I'm having the same problem... but im crawling in the web.

Original comment by sheepdf3 on 2 Apr 2012 at 7:55

GoogleCodeExporter commented 9 years ago

Thanks for your patch I integrated into the 
http://code.google.com/r/acrocrawler-crawler4j/ clone and added test cases, 
that lead to the discovery if issue 195.

Original comment by acrocraw...@gmail.com on 22 Feb 2013 at 9:04

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 6d413924d56c.

Original comment by ganjisaffar@gmail.com on 2 Mar 2013 at 7:30

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

This is now fixed in this changelist: 
https://code.google.com/p/crawler4j/source/detail?r=6d413924d56c57fdd62a61ea1c7e
0ecce1cc5219

Thanks for reporting.

-Yasser

Original comment by ganjisaffar@gmail.com on 2 Mar 2013 at 7:31