Closed ghost closed 3 years ago
Hi, I think you have an error in your decode usage. decode
expects the first parameter to be encoding of the data and the site you are trying to download is encoded using euc-jp - this can be seen in Content-Type header or metadata tag in HTML.
The output of decode
is String which in Dart is UTF-16.
I think using it like this gives expected output.
var response = await http.Client()
.get(Uri.parse("http://news4vip.livedoor.biz/archives/52385788.html"));
print("Status: ${response.statusCode}");
// Content-Type: text/html; charset=euc-jp
print("Content-Type: ${response.headers['content-type']}");
String decoded_body_byte =
await CharsetConverter.decode("euc-jp", response.bodyBytes);
print("Decoded: ${decoded_body_byte}");
Output:
flutter: Status: 200
flutter: Content-Type: text/html; charset=euc-jp
flutter: Decoded: <?xml version="1.0" encoding="EUC-JP"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" id="ldblog-standard">
<head>
<meta name="google-site-verification" content="0yUUhZOYqVOrcYBJ-Tw1lYw-D7kiorCn-4kDbnhK-ac" />
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Script-Type" content="text/javascript" /><link rel="shortcut icon" type="image/vnd.microsoft.icon" href="https://livedoor.blogimg.jp/news4vip2/imgs/2/b/favicon.ico" /><link rel="icon" href="https://livedoor.blogimg.jp/news4vip2/imgs/2/b/2b6a2183.png" />
<link rel="stylesheet" href="https://parts.blog.livedoor.jp/css/template.css?v=20190826" type="text/css" />
<link rel="stylesheet" href="https://parts.blog.livedoor.jp/css/comment2/heart.css?v=20180704" type="text/css" />
<link rel="stylesheet" <…>
I see that Android (with 'utf-8` passed) decodes most HTML, but gives up on Japanese characters. This is somewhat expected as these decoders will usually try their best even given bad input.
<div class="sidebody"><a href="http://5chmm.jp/">5ch�ޤȤ�ΤޤȤ�</a></div>
Passing euc-jp fixes it too.
Sorry, my explanation was lacking. This web page change response depend on userAgent. Could you please reconfirm this issue after changing userAgent to CFNetwork/1209 Darwin/20.2.0 (iPhone iOS/14.3). Then you can get response.headers['content-type'] = text/html; charset=utf-8 and get my error.
Yeah I see the problem. The site has some malformed characters and as we see iOS decoder does not like it. I'll try to fix that later by adding a option to ignore such characters.
However if you are just decoding from UTF-8 you may actually not need this package at all. Dart has Utf8Decoder and it will actually return an error too, but not if you pass true to allowMalformed
like this:
final decoded = Utf8Decoder(allowMalformed: true).convert(response.bodyBytes);
I also tracked down malformed characters:
<div id="ad" style="display:block !important;">
<!-- �칭�� -->
<script type="text/javascript">
if (window['header_cd'] && window['showed_header_ad'] != 1) {
show_ad(header_cd);
showed_header_ad = 1;
}
</script>
I can decode target web page using your below commands.
final decoded = Utf8Decoder(allowMalformed: true).convert(response.bodyBytes);
And I'm happy for your update for this plugin.
Thank you for your help!
I try to decode http response bodybyte into UTF-8 at iOS emulator(iPhone 12 Pro Max emulator. (iOS Deployment Target=9.0)). But result of decode was null. In android emulator case same code is successful. So I consider that this issue is only iOS side.
Please confirm this code and if possible please share solution.
The following is the output result of the above code.
Below is the output result of $ flutter doctor. I would appreciate it if you could answer.