url contains '\' - Githubissues

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. if the urls contain '\'

example: 
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc

the browser can recognizes the url

What is the expected output? What do you see instead?

It should download the doc file.But it seems that crawler4j can't recognize the 
'\' and can't convert it to a correct url

What version of the product are you using?

3.3
Please provide any additional information below.

heritrix can recognize the url

Original issue reported on code.google.com by gavincha...@gmail.com on 26 Mar 2012 at 3:21

GoogleCodeExporter commented 9 years ago

You are right.

It is a bug and will be fixed

Original comment by avrah...@gmail.com on 11 Aug 2014 at 1:27

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 091f5043337f.

Original comment by avrah...@gmail.com on 11 Aug 2014 at 1:30

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Fixed in revision hash: 091f5043337f

Original comment by avrah...@gmail.com on 11 Aug 2014 at 1:32

udayinfy / crawler4j

url contains '\' #139