<b>What steps will reproduce the problem?</b>
1.crawling websites, that contain "mailto:"
2.for example http://www.heise.de/index.html as Seed
3.
<b>What is the expected output? What do you see instead?</b>
expected: an sucessfull crawl.
instead : StringIndexOutOfBoundsException in WebURL.java
<b>What version of the product are you using?</b>
crawler4j 3.3.1
<b>Please provide any additional information below.</b>
The exception is thrown at WebURL.java on line 87 after a call of Parser.java
on line 133
<b>after changing the code at line 118 in Parser.java From:</b>
if (!hrefWithoutProtocol.contains("javascript:") &&
!hrefWithoutProtocol.contains("@")) {
<b>To:</b>
if (!hrefWithoutProtocol.contains("mailto:") &&
!hrefWithoutProtocol.contains("javascript:") &&
!hrefWithoutProtocol.contains("@")) {
it works for me.
Original issue reported on code.google.com by nuex...@googlemail.com on 23 Feb 2012 at 10:36
Original issue reported on code.google.com by
nuex...@googlemail.com
on 23 Feb 2012 at 10:36