Closed shawnye closed 10 years ago
You are right, there is a problem with this page. The problem is because server does not send the charset in the request headers, but in the HTML. Http
tool does not parse the returned HTML content, so it can't know what content type is set in HTML. (Maybe if we use Lagarto
we can figure this out ;)
Anyway, you are right, adding method to force charset is a great idea - but it has to be set in the HttpResponse
. Thank you!
Except... this already exist ;)
HttpRequest request = HttpRequest.get("http://blog.sina.com.cn/s/blog_6f2171a10100unux.html");
HttpResponse response = request.send();
response.charset("utf-8");
String text = response.bodyText();
Just use charset()
method after you received a response. Thats all :)
yes, I should use response.charset(charset);
instead of request.charset(charset);
thank you!
jodd.http.HttpBrowser.getPage() may not get correct charset, and HttpRequest.charset(charset) it seems be of no veil. I have to convert charset myself as following code showed:
you can try the following page url without custom charset, it displays messy code for Chinese, http://blog.sina.com.cn/s/blog_6f2171a10100unux.html (correct title display like this
chrome CSS广告过滤进阶设置
)but I found the page source contains header for correct charset: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Maybe the website server does not give correct charset prompt , can we custom charset using method like HttpRequest.charset(charset) ? Thank you.