mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Error while analyzing links from a page with query string and no file extension #114

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Crawl a page with a URL like http://foo.bar/mydir/myfile (myfile has no 
extension)
2. In this 'myfile' which is an HTML page although no extension but with 
headers correctly set, there is a link like : <a href='?page=2'> 

What is the expected output? What do you see instead?
I should see : http://foo.bar/mydir/myfile?page=2 
and I get : http://foo.bar/mydir?page=2 

What version of the product are you using?
3.1

Please provide any additional information below.
Everything in there. Thanks for taking a look.

Original issue reported on code.google.com by milkdata...@gmail.com on 24 Jan 2012 at 10:39

GoogleCodeExporter commented 9 years ago
Thanks for reporting. Apparently there are two different standards for 
resolving relative URLs. I just committed a change which follows the same 
standard as major browsers: 
http://code.google.com/p/crawler4j/source/detail?r=d362515f7d300dcdb1fe7ca2b08c9
2c5c363f9c4

This will be included in the next release.

-Yasser

Original comment by ganjisaffar@gmail.com on 5 Feb 2012 at 12:53