The String must be UTF-16 encoded when calling WriteConsoleW. Otherwise, the program crashes at toEnum here.
Note that my patch doesn't care WriteConsoleW's buffer limit. Actually, I tried running
main = runInputT defaultSettings $ outputStrLn $ replicate 20000 '\x1F986'
on my Windows machine (Win10 Pro 1903), but WriteConsoleW seems to have succeeded. If problem arises on older Windows, the patch may need to be reconsidered.
Reading from console
Windows sends two input events for a non-BMP character: lead surrogate, trail surrogate. So we need to decode them.
wcwidth
Since haskeline calls wcwidth on the prompt string, wcwidth must also be able to handle non-BMP characters. Otherwise, a program like
main = runInputT defaultSettings $ do
_ <- getInputLine "\x1F986"
return ()
The fix is just changing wchar_t/CWchar to int/CInt when interfacing with the C counterpart (haskeline_mk_wcwidth).
Other C functions in h_wcwidth.c (haskeline_mk_cwswidth, haskeline_mk_wcwidth_cjk, haskeline_mk_wcswidth_cjk) are not modified, because they seem to be unused.
Currently, haskeline does not properly handle surrogate pairs on Windows. This leads to issues like
This PR consists of three parts:
The
String
must be UTF-16 encoded when callingWriteConsoleW
. Otherwise, the program crashes attoEnum
here.Note that my patch doesn't care
WriteConsoleW
's buffer limit. Actually, I tried runningon my Windows machine (Win10 Pro 1903), but
WriteConsoleW
seems to have succeeded. If problem arises on older Windows, the patch may need to be reconsidered.Windows sends two input events for a non-BMP character: lead surrogate, trail surrogate. So we need to decode them.
wcwidth
Since haskeline calls
wcwidth
on the prompt string,wcwidth
must also be able to handle non-BMP characters. Otherwise, a program likecrashes at
toEnum
here.The fix is just changing
wchar_t
/CWchar
toint
/CInt
when interfacing with the C counterpart (haskeline_mk_wcwidth
).Other C functions in
h_wcwidth.c
(haskeline_mk_cwswidth
,haskeline_mk_wcwidth_cjk
,haskeline_mk_wcswidth_cjk
) are not modified, because they seem to be unused.