Open oTnTh opened 2 years ago
Unable to reproduce ( I copied the code from here, so it could be saved in unicode )
AutoIt-VSCode v1.0.9
Visual Studio Code v1.71.2
As far as I can tell AutoIt-VSCode doesn't use SciTE configurations.
The default codepage is depending on the language of Windows settings, for Chinese it's cp936.
So I have to put these in my SciTEUser.properties, to get correct output in the Output Panel of Scite.
code.page=65001
output.code.page=936
VSCode doesn't have similar things like this, and that cause my problems.
Please take a look with this script:
Func _StringToCodepage($sStr, $iCodepage)
Local $aResult = DllCall("kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sStr, _
"int", StringLen($sStr), "ptr", 0, "int", 0, "ptr", 0, "ptr", 0)
Local $tCP = DllStructCreate("char[" & $aResult[0] & "]")
$aResult = DllCall("Kernel32.dll", "int", "WideCharToMultiByte", "uint", $iCodepage, "dword", 0, "wstr", $sStr, _
"int", StringLen($sStr), "struct*", $tCP, "int", $aResult[0], "ptr", 0, "ptr", 0)
Return DllStructGetData($tCP, 1)
EndFunc ;==>_StringToCodepage
$cp = DllCall("kernel32.dll", "int", "GetACP")
ConsoleWrite("Default Codepage: " & $cp[0] & @CRLF)
ConsoleWrite('----------------' & @CRLF)
; Unicode: U+4E2D U+6587
$strA = "中文"
ConsoleWrite("$strA: " & $strA & @CRLF)
ConsoleWrite(String(StringToBinary($strA)) & @CRLF)
ConsoleWrite('----------------' & @CRLF)
$strB = _StringToCodepage($strA, 65001)
ConsoleWrite("$strB: " & $strB & @CRLF)
ConsoleWrite(String(StringToBinary($strB)) & @CRLF)
ConsoleWrite('----------------' & @CRLF)
In Scite, with output.code.page=936
, everything worked as expected.
VSCode assumes encoding of output is UTF-8, which is not.
VSCode doesn't seem to have cp936 Just copy/paste the example code works just fine...can you attach a sample file?
cp936 is GBK, a superset of GB2312.
GB18030 is a superset of GBK, but it's a 4-bytes encoding, so it has a new identifier cp54936.
I didn't know anything about VSCode Extension API, if there's no such thing like GetACP(), autoit.outputCodePage
is good enough for me.
Before write to the Output Panel of VSCode, convert the output of AutoIt from autoit.outputCodePage
to UTF-8, the problem should be solved。
The encoding of script file is not relevant to this problem.
Can you show me the output of my script in VSCode, please?
I guess we are out of luck on this one. Almost 7 years since it was requested...
WOW, a text editor (sort of) cannot handle text encoding, I didn't expect for that.
Seems there's nothing we can do now.
Thanks for your time.
Well, technically, if you can see text of your code properly - it handles encoding properly...it's the output of another application that it's having issues with...
Even now (Win11 22H2), Powershell and CMD use ANSI (aka cp936 for Chinese) as the default code page.
If I compile my script as a CUI EXE, here is the output:
Same as the output in Scite.
ConsoleWrite intend to write something to STDOUT, and the default codepage of STDOUT is ANSI.
As a user, I would love to have a solution, but I can't say that Autoit is wrong.
Also, I think it's not fair to you. You did a greate job, but CJK users have to choose.
Maybe as a work around you could use this for now: https://www.autoitscript.com/forum/topic/208189--
Proposed #123 adds new option Output Code Page
.
In this particular case I had to set it to gbk
in order to get proper result:
With cp936
I get different $strB
result:
WOW, thank you for keep working on this.
strB
is not a valid GBK string, so when we try to encode strB from GBK to UTF-8, the result is meaningless.
I tink you can ignore the difference in AutoIt-VSCode.
However, they do have some differences between GBK and CP936.
You could consider GBK as ECMAScript7, and CP936 as Crhome V8.
If a code-point is undefined in the standard, the author of charmap could make the decision how to handle the conversion.
Take a look at this:
var encoding = require('encoding');
buf = Buffer.from([0xe4, 0xb8, 0xad, 0xe6, 0x96, 0x87])
resultB1 = encoding.convert(buf, 'utf-8', 'gbk')
resultB2 = encoding.convert(buf, 'utf-8', 'cp936')
console.log(resultB1)
console.log(resultB2)
console.log('-----------------------------------')
resultC1 = encoding.convert(resultB1, 'gbk', 'utf-8')
resultC2 = encoding.convert(resultB2, 'cp936', 'utf-8')
console.log(resultC1)
console.log(resultC2)
Output:
<Buffer e6 b6 93 ee 85 9f e6 9e 83>
<Buffer e6 b6 93 ef bf bd e9 8f 82 ef bf bd>
-----------------------------------
<Buffer e4 b8 ad e6 96 87>
<Buffer e4 b8 3f e6 96 3f>
Even though strB
is not a valid GBK string, after two conversions, with GBK
argument, we didn't lose any data.
I'm not sure, but I guess that's why the GBK charmap of iconv-lite is not compatible with CP936.
It's all Chinese to me (pun intended)
Maybe it would be more suitable to report it at iconv-lite
If PR goes forward, it will use iconv-lite
library instead of encoding
It's not a bug of iconv-lite, Chinese people would recognize the differences between GBK and CP936, they have to.
The text encodings are real pain in the ass, really. The problems could jump out everywhere.
But for English native speakers, they didn't use it, and hard to explain to them. Like you said, it's all Chinese.
So I'm very grateful for you, I do.
So, the question is, does SciTe has the same issue? (cause on your screenshot it looks exactly like in vcode after conversion) or should I suspend the PR until we find 100% working solution?
Generally, when we saw messy codes, it just means "Something is wrong here".
As long as iconv-lite handle the normal text correctly, I think we can ignore the details.
SciTEUser.properties:
t.au3:
I think it's an encoding problem. Before update the Output panel of VSCode, AutoIt-VSCode should deal with the Encoding of strings.
Thanks.