What steps will reproduce the problem?
1. Crawl a page with a url that has a link with trailing %
2. For example, a page with the following link <a
href="http://www.example.com/search?width=100%&height=100%
What is the expected output? What do you see instead?
An illegal argument exception is thrown in the Parser code.
Here's the stack trace
ERROR [Crawler 25] URLDecoder: Incomplete trailing escape (%) pattern, while
processing: http://www.xxxxxxx.com/41274/PD/xxxxx.htm
java.lang.IllegalArgumentException: URLDecoder: Incomplete trailing escape (%)
pattern
at java.net.URLDecoder.decode(URLDecoder.java:187)
at edu.uci.ics.crawler4j.url.URLCanonicalizer.percentEncodeRfc3986(URLCanonicalizer.java:209)
at edu.uci.ics.crawler4j.url.URLCanonicalizer.canonicalize(URLCanonicalizer.java:191)
at edu.uci.ics.crawler4j.url.URLCanonicalizer.getCanonicalURL(URLCanonicalizer.java:99)
at edu.uci.ics.crawler4j.parser.Parser.parse(Parser.java:119)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:262)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:179)
at java.lang.Thread.run(Thread.java:679)
What version of the product are you using?
3.1
Please provide any additional information below.
Original issue reported on code.google.com by raj...@indix.com on 25 Jan 2012 at 9:12
Original issue reported on code.google.com by
raj...@indix.com
on 25 Jan 2012 at 9:12