mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

all urls are lowercase when use WebCrawler.java #130

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.user a WebCrawler。
2.you will found all urls are changed to lower case。
3.some website will redirect url when it is lowercase 
(http://www.amazon.cn/) ,so we will see a Dead Loop 301.

What is the expected output? What do you see instead?
Dead Loop 301,do not change the case of urls .

What version of the product are you using ?
3.3

Please provide any additional information below.

do not change the case of urls .
RobotstxtServer.allows()
Parser.parse();

Original issue reported on code.google.com by zwj0...@gmail.com on 2 Mar 2012 at 8:16

GoogleCodeExporter commented 9 years ago
Hi,
Would you please provide an example of a URL before and after being lowercased 
by crawler4j. The current implementation only lower cases the domain names 
which is expected.

-Yasser

Original comment by ganjisaffar@gmail.com on 5 Mar 2012 at 6:58

GoogleCodeExporter commented 9 years ago
Sorry ,i think may be i have made a mistake! This is not a issue .

Original comment by zwj0...@gmail.com on 6 Mar 2012 at 12:31

GoogleCodeExporter commented 9 years ago

Original comment by ganjisaffar@gmail.com on 7 Mar 2012 at 4:56