Errornous link URL extraction if the HTML contains <base href="...">

What steps will reproduce the problem?
1. Crawl a site http://a.b/c/d/e.html where the HTML contains <base 
href="http://a.b/c/">
2. Any relative links in the page will be wrongly extracted, e.g. "../x.html" 
will be extracted as "http://a.b/c/x.html" instead of "http://a.b/x.html"

What is the expected output? What do you see instead?
Any relative links in the page will be wrongly extracted, e.g. "../x.html" will 
be extracted as "http://a.b/c/x.html" instead of "http://a.b/x.html"

What version of the product are you using? On what operating system?
version 2.2 and latest build from SVN. Windows 7.

Please provide any additional information below.
The attached patch on /src/edu/uci/ics/crawler4j/crawler/HTMLParser.java may 
help.

Original issue reported on code.google.com by hoiwai1...@gmail.com on 31 Dec 2010 at 2:42

Attachments:

HTMLParser.diff

mohankreddy / crawler4j

Errornous link URL extraction if the HTML contains <base href="..."> #24