Closed GoogleCodeExporter closed 9 years ago
Original comment by avrah...@gmail.com
on 13 Aug 2014 at 6:14
ok, I tried your scenario but the bug doesn't reproduce.
1. your example is bad as your URLs (in example.com) return 404...
2. I have used the following scenario to try and reproduce the bug:
controller.addSeed("http://www.w3.org/TR/WD-html40-970708/htmlweb.html");
controller.addSeed("http://www.w3.org/Tr/Wd-html40-970708/hTmlweb.html");
As you can see, the second URL has different case sensitivity in several
letters.
The URL is being crawled only once, and is skipped (doesn't get to the "visit"
method the second time.)
Please send me a specific scenario (with URLs which don't work) - I will run
and test them as I tried and the bug doesn't reproduce.
As a side note - we should reconsider the crawler's functinoality, as the DNS
name is case insensitive BUT the path should actually be case sensitive
Original comment by avrah...@gmail.com
on 13 Aug 2014 at 7:45
Hi,
In your example both URLs
http://www.w3.org/TR/WD-html40-970708/htmlweb.html
http://www.w3.org/Tr/Wd-html40-970708/hTmlweb.html
Redirect to
http://www.w3.org/TR/WD-html40-970708/htmlweb.html
But in my case,
http://www.example.com/WW/Sample.html
http://www.example.com/ww/sample.html
don't redirect to any correct one :(, it stay in the same URL, the response URL
is the same.
Original comment by edgar.ri...@gmail.com
on 19 Aug 2014 at 5:22
The URLs that Im working on, are located in a Sharepoint
Original comment by edgar.ri...@gmail.com
on 19 Aug 2014 at 5:24
After some investigations, seems that the problem is URL Rewrite functionality
not available below IIS Servers, I don't know what would be the best way to fix
this issue :(
Original comment by edgar.ri...@gmail.com
on 19 Aug 2014 at 7:22
Sorry mate, I don't think I can help you in this scenario...
Original comment by avrah...@gmail.com
on 20 Aug 2014 at 8:31
Thanks, sounds that is a expected behavior
Original comment by edgar.ri...@gmail.com
on 21 Aug 2014 at 3:31
Original comment by avrah...@gmail.com
on 21 Aug 2014 at 8:39
Original issue reported on code.google.com by
edgar.ri...@gmail.com
on 13 Aug 2014 at 2:35