ljhsecret / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Processing of robots.txt causes java.lang.StringIndexOutOfBoundsException: String index out of range: -3 #27

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Run crawler for domain with has robots.txt file with 'allow:' instruction 
(for example http://www.explido-webmarketing.de/)

What is the expected output? What do you see instead?
Exception appears:

java.lang.StringIndexOutOfBoundsException: String index out of range: -3
    at java.lang.String.substring(String.java:1937)
    at java.lang.String.substring(String.java:1904)
    at edu.uci.ics.crawler4j.robotstxt.RobotstxtParser.parse(RobotstxtParser.java:86)
    at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.fetchDirectives(RobotstxtServer.java:77)
    at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.allows(RobotstxtServer.java:57)
    at edu.uci.ics.crawler4j.crawler.WebCrawler.preProcessPage(WebCrawler.java:187)
    at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:105)
...

What version of the product are you using? On what operating system?
version - 2.6
operation system - windows 7

Please provide any additional information below.
Seems that value of constant 
edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.PATTERNS_ALLOW_LENGTH is 
incorrect. 

Original issue reported on code.google.com by aleksa...@gmail.com on 16 Mar 2011 at 5:03

GoogleCodeExporter commented 8 years ago
Thanks for reporting this. I have fixed the bug and uploaded the new version.

-Yasser

Original comment by ganjisaffar@gmail.com on 16 Mar 2011 at 5:50