joshdick / miniProxy

🚨⚠️ UNMAINTAINED! ⚠️🚨 A simple PHP web proxy.
http://joshdick.github.io/miniProxy
GNU General Public License v3.0
860 stars 544 forks source link

encoding #91

Closed xuchaoji closed 4 years ago

xuchaoji commented 7 years ago

Does not support gb2312 encoding, the page shows garbled

joshdick commented 7 years ago

Do you have an example page? Does the server for the page in question send a correct Content-Type header with the encoding specified?

zxyqwe commented 6 years ago

Some Chinese web servers return the HTML pages with header as:

<meta http-equiv="content-type" content="text/html; charset=gb2312" />
OR
<meta http-equiv="content-type" content="text/html; charset=gbk" />

These charsets are key official character sets of the People's Republic of China, used for simplified Chinese characters.

zxyqwe commented 6 years ago

@joshdick

Original server returns:

HTTP response
Content-Type:text/html

HTML document
<meta http-equiv="Content-Type" content="text/html; charset=gbk" />

miniProxy convert it from "ISO-8859-1" to "HTML-ENTITIES" and returns:

HTTP response
Content-Type:text/html;charset=UTF-8

HTML document
<meta http-equiv="Content-Type" content="text/html; charset=gbk" />

my modification would convert it from "gbk" to "HTML-ENTITIES" and returns:

HTTP response
Content-Type:text/html;charset=UTF-8

HTML document
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
viczhang commented 5 years ago

Here is an example page with this problem: http://www.creaders.net/