Error while analyzing links from a page with query string and no file extension

mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j

0 stars 0 forks source link

What steps will reproduce the problem?
1. Crawl a page with a URL like http://foo.bar/mydir/myfile (myfile has no 
extension)
2. In this 'myfile' which is an HTML page although no extension but with 
headers correctly set, there is a link like : <a href='?page=2'> 

What is the expected output? What do you see instead?
I should see : http://foo.bar/mydir/myfile?page=2 
and I get : http://foo.bar/mydir?page=2 

What version of the product are you using?
3.1

Please provide any additional information below.
Everything in there. Thanks for taking a look.

Original issue reported on code.google.com by milkdata...@gmail.com on 24 Jan 2012 at 10:39

Thanks for reporting. Apparently there are two different standards for resolving relative URLs. I just committed a change which follows the same standard as major browsers: http://code.google.com/p/crawler4j/source/detail?r=d362515f7d300dcdb1fe7ca2b08c9 2c5c363f9c4 This will be included in the next release. -Yasser

mohankreddy / crawler4j

Error while analyzing links from a page with query string and no file extension #114