udayinfy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

url contains '\' #139

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. if the urls contain '\'

example: 
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc

the browser can recognizes the url

What is the expected output? What do you see instead?

It should download the doc file.But it seems that crawler4j can't recognize the 
'\' and can't convert it to a correct url

What version of the product are you using?

3.3
Please provide any additional information below.

heritrix can recognize the url

Original issue reported on code.google.com by gavincha...@gmail.com on 26 Mar 2012 at 3:21

GoogleCodeExporter commented 9 years ago
You are right.

It is a bug and will be fixed

Original comment by avrah...@gmail.com on 11 Aug 2014 at 1:27

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 091f5043337f.

Original comment by avrah...@gmail.com on 11 Aug 2014 at 1:30

GoogleCodeExporter commented 9 years ago
Fixed in revision hash: 091f5043337f    

Original comment by avrah...@gmail.com on 11 Aug 2014 at 1:32