What steps will reproduce the problem?
1.Run crawler for domain with has robots.txt file with 'allow:' instruction
(for example http://www.explido-webmarketing.de/)
What is the expected output? What do you see instead?
Exception appears:
java.lang.StringIndexOutOfBoundsException: String index out of range: -3
at java.lang.String.substring(String.java:1937)
at java.lang.String.substring(String.java:1904)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtParser.parse(RobotstxtParser.java:86)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.fetchDirectives(RobotstxtServer.java:77)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.allows(RobotstxtServer.java:57)
at edu.uci.ics.crawler4j.crawler.WebCrawler.preProcessPage(WebCrawler.java:187)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:105)
...
What version of the product are you using? On what operating system?
version - 2.6
operation system - windows 7
Please provide any additional information below.
Seems that value of constant
edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.PATTERNS_ALLOW_LENGTH is
incorrect.
Original issue reported on code.google.com by aleksa...@gmail.com on 16 Mar 2011 at 5:03
Original issue reported on code.google.com by
aleksa...@gmail.com
on 16 Mar 2011 at 5:03