mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

setURL can crash and burn in the case of malformed URLs or weird protocols #164

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a web-page with a malformed URL (or a protocol like mailto:)
2. Run the crawler on said website.
3. Crash and burn at line 89 in WebURL.java - this IndexOutOfBounds exception 
completely breaks the crawl. It should probably throw a silent exception and 
catch or log it. I would strongly suggest using the java.net.URL parser as 
opposed to your custom solution.

What is the expected output? What do you see instead?
I would expect an exception to be thrown somewhere inside public void 
setURL(String url) and the crawl should not fail completely.

What version of the product are you using?
3.3

Please provide any additional information below.

Original issue reported on code.google.com by david.titarenco on 5 Jul 2012 at 10:51

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:23