jungjonghun / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

crawler for gbk html page #96

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.controller.addSeed("http://www.tudou.com/"); Tudou is a video site in China 
whose html page charset is GBK.
2.set "crawler.default_encoding=gbk" in file crawler4j.properties.

What is the expected output? What do you see instead?
Expected output:
The crawler runs well.

What i got:
error message:
ERROR [main] Error while fetching http://www.tudou.com/robots.txt
ERROR [Crawler 1] Error while fetching http://www.tudou.com/

What version of the product are you using? On what operating system?
Version 2.6.1
OS: Linux X 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 
x86_64 x86_64 GNU/Linux

Please provide any additional information below.

How can i solve this problem?

Original issue reported on code.google.com by wenlei.z...@gmail.com on 21 Nov 2011 at 6:02

GoogleCodeExporter commented 9 years ago
As of version 3.0 this feature is supported. Character encoding of pages is now 
automatically detected.

-Yasser

Original comment by ganjisaffar@gmail.com on 2 Jan 2012 at 7:20