mawww / kakoune

mawww's experiment for a better code editor
http://kakoune.org
The Unlicense
9.99k stars 715 forks source link

[BUG/QUESTION] UTF-8 characters render as question marks in kak under WezTerm on MacOS #5195

Closed WojciechP closed 4 months ago

WojciechP commented 5 months ago

Version of Kakoune

Kakoune 2024.05.18

Reproducer

Using kak under WezTerm when editing or viewing non-ASCII UTF-8 characters seems broken. I understand this is most likely not a bug in kakoune, but rather a result of quirky interaction, but I'm out of ideas on how to debug.

Reproducing using wezterm 20240203-110809-5046fc22 the characters display correctly directly in bash under WezTerm, but badly inside kak:

# Environment:
bash-3.2$ locale
LANG="en_CH.UTF-8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
bash-3.2$ echo $TERM
xterm-256color

# Control case, without kak:
bash-3.2$ cat ~/Coding/utf8.txt
polish: ąęśćł
german: üöä
pound: £
icons: 󰕾 󰖀 󰕿 󰖁

# The reproducer:
bash-3.2$ kak ~/Coding/utf8.txt

Wezterm with cat Wezterm with kak

Funnily enough, the same works fine under Terminal.app (apologies for a dark-on-dark colorscheme, I don't use Terminal.app at all):

Terminal.app with cat Terminal.app with kak

I also tried running kak -n to skip loading my kakrc, didn't help. Any tips on how to troubleshoot appreciated.

Outcome

The non-ascii characters show up as question marks on red diamond background.

Expectations

I would like kak under WezTerm to render UTF8 correctly.

Additional information

bash-3.2$ kak -version
Kakoune 2024.05.18
bash-3.2$ uname -a
Darwin Wojciechs-MacBook-Pro.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103 arm64
bash-3.2$ wezterm -V
wezterm 20240203-110809-5046fc22
bash-3.2$ 
Screwtapello commented 5 months ago

The difference between WezTerm and Terminal.app seems to be that Terminal.app has LC_CTYPE="UTF-8" while WezTerm has LC_CTYPE="C".

In the output of the locale command, anything that says "C" means "no Unicode support". For example, here's what I get on Debian Linux:

LANG=en_AU.UTF-8
LANGUAGE=en_AU:en
LC_CTYPE="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_PAPER="en_AU.UTF-8"
LC_NAME="en_AU.UTF-8"
LC_ADDRESS="en_AU.UTF-8"
LC_TELEPHONE="en_AU.UTF-8"
LC_MEASUREMENT="en_AU.UTF-8"
LC_IDENTIFICATION="en_AU.UTF-8"
LC_ALL=

According to https://github.com/mawww/kakoune/issues/3768 your situation seems to be what happens when the "system language" and "region" are set to different things in the macOS System Preferences app.

WojciechP commented 4 months ago

Indeed, I missed the LC_CTYPE difference - thank you! Having restarted everything, I get UTF8-enabled locale output, and kakoune under WezTerm works just fine. I will chalk it up to "something went wrong somewhere when I was messing around with my setup", as I cannot reproduce the issue any more. And yes, my language and region settings don't match, but now that it's not causing any trouble any more I'll leave it as it is.

Thank you for such a quick response in any case!