mobile-shell / mosh

Mobile Shell
https://mosh.org
GNU General Public License v3.0
12.7k stars 743 forks source link

Display errors with certain characters #234

Open whiteplastic opened 12 years ago

whiteplastic commented 12 years ago

I use a custom irssi theme that contains the UTF-8 "Fleur de Lys" symbol (U+269C - ⚜). While this character is displayed just fine when I use ssh, it just disappears in mosh. Also, there are display errors in irssi: random characters just disappear or get swapped by other characters. This only occurs when I use my custom theme so there might be a connection.

keithw commented 12 years ago

On Linux, this works fine for me, but on Mac OS X 10.7.3, the system does not know about this character and wcwidth() returns -1 (unprintable), so mosh does not know how many columns the character will occupy.

Assuming you are using a Mac, that unfortunately is the answer. We will report this to Apple.

whiteplastic commented 12 years ago

Yes, I'm on OSX 10.7.3. It seems like the system does know about this character. ssh and any other application I use knows and displays it, the only application that seems not to know it is mosh.

kmcallister commented 12 years ago

SSH doesn't need to know about characters; it just conveys a stream of bytes from one end to the other. Mosh has a terminal state object which is synchronized between server and client, so it needs the character metadata on both machines.

What outer terminal emulator are you using; is it OS X's standard Terminal.app? And do you have any other terminal emulators in the mix, e.g. screen or tmux?

You can compile and run this C program on both machines to check if wcwidth knows about U+269C.

#define _XOPEN_SOURCE
#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main() {
    setlocale(LC_ALL, "");
    printf("%d\n", wcwidth(0x269C));
    return 0;
}

(I didn't test this on OS X, so it's possible it will fail to compile for some reason.)

It will print a positive number iff the character is known. Make sure to run it in a Unicode locale. If you don't have one by default, you can do something like

gcc -o foo foo.c;  LANG=en_US.UTF-8 ./foo

If you get a positive number on both server and client, and yet Mosh does not work correctly, then there's a bug in Mosh and we can investigate further.

(In the long run I would like to use a dedicated Unicode library, and drop our dependence on the system locale libraries, which have caused no end of trouble. See discussion on #74.)

keithw commented 12 years ago

I think officially speaking, a Unicode app is supposed to use the "default" properties of the code point range (including width) if it doesn't know about the particular character. Unfortunately there doesn't seem to be a way to get these default properties in POSIX. A dedicated Unicode library would help with this.

EdSchouten commented 12 years ago

Hi Keith,

Just checking. I think you can't assume wchar_t is ISO 10646. It is just an implementation defined `wide character'. If you are working with ISO 10646 inside Mosh explicitly (not wide characters), then you shouldn't use wcwidth(). In the past I once needed a compact implementation of wcwidth(), explicitly for use with ISO 10646. Markus Kuhn has an implementation that seems to work quite nicely:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

Maybe it is of any use to you? Otherwise, I'm pretty sure IBM's ICU should be of use:

http://site.icu-project.org/

Ed

keithw commented 12 years ago

Hi Ed,

At configure time we check for __STDC_ISO_10646__, which the C library is supposed to define if wchar_t is ISO/IEC 10646 / UTF-32. (We used to assert it, but in practice only GNU libc seems to define it, even though OS X and FreeBSD do also obey it in practice. We print a warning on configure on these systems.)

We may have to ship our own Unicode library eventually. ICU is kind of a monstrous beast though.

-Keith

EdSchouten commented 12 years ago

Hi Keith,

Thanks for the explanation!

lilyball commented 10 years ago

I just commented this on #361, but since it's about OS X it seems a bit more relevant to this ticket (although both tickets appear to be virtually the same thing):


I keep periodically hitting situations where various characters don't render in Mosh, because wcwidth() doesn't support them (OS X client, Ubuntu server). As documented, some characters are because OS X's wcwidth() returns -1, but I also see a bunch of characters (notably, emoji like U+1F4A9 PILE OF POO) that OS X supports but Ubuntu's doesn't (curiously, __STDC_ISO_10646__ on Ubuntu claims that Unicode 6.0 is supported, and the code chart for Unicode 6.0 does list this character, so I don't know why wcwidth() is returning -1).

At this point I'm thinking the only real solution to this problem is for Mosh to calculate character widths itself. Perhaps it could fall back to its own calculation if the platform-provided wcwidth() returns -1, thus allowing the platform's idea of width to take precedence for all characters it knows about. The only real issue with this that comes to mind is if the calculated width disagrees with how the rendering terminal thinks the character should display, but I did some research earlier today and it seems that all characters (including reserved ones) outside of the already-defined East_Asian_Width blocks are assumed to be "Neutral", which basically means they'll never have a width of 2. Assuming a width of 1 for any reserved characters seems reasonable, because if the OS disagrees it will provide an explicit 0 instead of -1 (and I'm suggesting you use this calculation only when the OS version returns -1).


Or as suggested in this ticket you could just ship your own unicode library entirely. My concern is that if Mosh thinks a character has a width of 1 but the terminal emulator thinks it has a width of 2, that will presumably render incorrectly. I'm assuming that the terminal emulator agrees with wcwidth() (for all characters where wcwidth() returns a non-negative value; Terminal.app on OS X renders e.g. U+26A1 HIGH VOLTAGE SIGN as one cell but wcwidth() on OS X returns -1). That assumption is why I suggested above to use the return value of wcwidth() whenever it's non-negative and fall back to a custom implementation otherwise.

lilyball commented 10 years ago

Addendum: Apparently glibc uses Unicode 6.0 but it's LC_CTYPE support is still stuck at Unicode 5.0 (and wcwidth() uses LC_CTYPE).

jhrmnn commented 10 years ago

⚡, U+26A1 seems to be problematic for example. Mosh under Terminal.app displays it is as a zero-width character in Vim. Leading to very strange behaviour in a shell...

The left terminal is mosh/tmux/fish, right ssh/tmux/fish in the same tmux session. When the mosh terminal is smaller than ssh, mosh is off by one character on the command-line. But if the ssh terminal is bigger, mosh is by some miracle right even though skipping ⚡.

This is probably not worth any work, I guess, but it might useful to mention this problem in documentation, so one can find it upon searching for unicode or utf-8. I spent good two hours on this :)

cgull commented 9 years ago

My current thinking on Unicode issues:

Mosh is a virtual terminal, split across client and server, and it uses normal terminal datastreams between client and server. Therefore, it must be consistent between client and server, and should be as advanced with its Unicode version as it can be. If we are up to date on Unicode, there's no need to match the server application's notion of Unicode: if a server application outputs a Unicode character that it doesn't know about, then it has already lost: if it's doing any formatting of the output, it doesn't know how wide the character is and may be feeding us corrupt line or full screen formatting to begin with.

This argument dictates that Mosh must have its own internal wcwidth implementation for its virtual terminal, because client & server may have different host wcwidth implementations. If mosh receives a character known by the server's wcwidth but not the client's, then its placement of subsequent characters on the line will be wrong in our virtual terminal, and we will lose badly, because Mosh quite efficiently avoids redisplaying characters it doesn't think have changed.

Mosh then sends the character off to the client's terminal, where it can be correctly formatted and displayed. Now we have the problem that the display terminal may have a lower version of Unicode than Mosh does, and may therefore corrupt output if its notion of character width differs from ours.

This is in general a hard problem: Most current terminal emulators either depend on a system's GUI environment (gnome, kde) for i18n, or have their own implementation to escape the vagaries of host OS implementation. So most terminal emulators actually do something better than the host OS's wcwidth implementation, which also means that the host wcwidth does not usefully tell us what the terminal will actually do. Mosh cannot know what version of Unicode the display terminal is using; the only thing it can even begin to do is output characters and check the cursor position after output. There is one heuristic that we can check for: most terminal emulators set environment variables to indicate their presence and sometimes even their version. Using this heuristics means maintaining tables of programs/versions against Unicode versions they support, though.

But the user can legitimately ssh into a remote host and run mosh-client there, in which case these variables have been discarded and we have no clue. We can't handle that. At all.

My current best idea for handling this is to offer the user two options:

One unfortunate thing here is that Unicode will continue to grow with new versions. When that happens, if we upgrade our internal wcwidth, we are back to the current situation of differing client and server Unicode versions-- but if we have an up-to-date wcwidth implementation, we are doing better than using the system implementation.

Perhaps we need to design a scheme where the client gets a character width table from the server. I think this idea has been mentioned before.

About the Markus Kuhn wcwidth implementation: It's been brought up several times in Mosh discussion. It's an excellent easy-to-understand sample implementation, But it has a number of unpredictable branches, and then an expensive binary search through its tables. The commonly-available copies of it available around the net are now out of date, and it is slow. It has significant performance impact when coupled with my performance code; I have benchmarked it against the FreeBSD wcwidth and the musl wcwidth, both are much better (but a lot less readable). Also, I offer you this tidbit:

http://osdir.com/ml/internationalization.linux/2001-01/msg00191.html

Mosh is an application that uses wcwidth heavily, and can spend significant time in slower wcwidth implementations, slowing down character handling noticeably.

Separately, Google shows me a discussion on GNU libc that its wcwidth calls an expensive linear search to determine which locale it's in. That will no doubt get fixed, but.

I have not looked at it as closely, or in a while, but if I remember right ICU does not directly offer a wcwidth function, and in general it's a heavyweight featureful implementation not suited to be called for individual characters as often as we do.

zuzak commented 9 years ago

This doesn't appear to be a mac-specific issue: I have this problem in gnome-terminal on Ubuntu. Emoji don't render in an irssi screen session over mosh 1.2.4a, but do on the same screen session over SSH.

rapha8l commented 9 years ago

Hi, Also on Linux ⮂ and ⮀ do not display at all with mosh 1.2.4a with any terminal and utf-8 set on both sides Thanks

chenkaie commented 9 years ago

Yeap, I think for a heavy terminal user, powerline is a well known package. However certain symbols/patched fonts are used to make it looks fancy, like all these symbols ⭠ ⭡ ⭢⭣ ⭤ ⮀ ⮁ ⮂ ⮃ ⋅ ⋮ ❐ If this issue can be handled, that would be awesome :+1:

raine commented 9 years ago

I have the same problem where emojis are not rendered when connecting with mosh but they do when using just ssh.

andrey-str commented 8 years ago

Have the same issue as @raine : mosh does not display unicode emoji symbols(🏠 in my case), but ssh does. I tried with iTerm2 and iTerm3 Beta on OS X.

NHDaly commented 8 years ago

Bump to resurrect this thread. I'm having the same issue as above, also for emojis (🏠, 🖥, 🚀, 👾 in my case, coming from the hostnames file in my dotfiles).

Is there a plan to move forward with @cgull's proposal?

bhamiltoncx commented 8 years ago

If you don't want to bring in the beast that is ICU, you can just ship the EastAsianWidth.txt file:

http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt

It's pretty easy to parse this and transform it into whatever form you want.

daviddias commented 8 years ago

Is there any update with a solution for this? Specially for the chars mentioned here: https://github.com/mobile-shell/mosh/issues/234#issuecomment-123790023 ? Thank you!

tombh commented 7 years ago

I've just been down the rabbit hole of this problem. There are so many places that could take responsibility for it;

In summary it just seems like a subtle problem, that can't be easily fixed in one place. So for now I'm just going to remove any special characters from my setup.

rwuwon commented 6 years ago

Edit: After writing all this, I've gone over the earlier comments again and they make more sense to me now. Please disregard if all of the following is already well understood and has no fix.

I've been trying to troubleshoot this over the past few days and believe I've started to make some progress in narrowing this down as far as the 🤔 emoji/utf-8 display goes (it's UTF/unicode, but I'm testing with the thinking face emoji so I'll refer to it as that here). By the way, don't try to copy the emoji from here on GitHub because they turn it into an image - instead, head to emojipedia to copy & paste into your own terminals.

I don't think there's significant relevance to what (modern) terminal program is being used (gnome-terminal, macOS Terminal, iTerm2, JuiceSSH, etc - they all default quite well these days). I also don't think tmux or even irssi has anything to do with it - but to be clear, I've been testing with only plain bash and fish; no tmux, no powerline - no other user-complications to the best of my knowledge.

What's working in CentOS 7, Ubuntu Server 14.04.5 LTS, Ubuntu Server 18.04 LTS, Fedora 28:

What not working in CentOS 7.5.1804 (including one non-test install; mosh 1.3.0), Ubuntu Server 14.04.5 LTS (mosh 1.3.2):

The two cases where emojis through a mosh connection does work:

Suggestion for all in this thread: Please note these aren't intended as workarounds and are only to help eliminate what I believe are some red herrings (tmux, irssi, terminal emulators, etc).

  1. See if you can all reproduce this issue by installing a basic server/minimal install of CentOS 7.5, Ubuntu 14.04.5 in VirtualBox (or qemu-kvm if you prefer, but make sure you understand how to SSH/Mosh to it from the host) - I think it's likely you will, should you set up CentOS 7.5 or Ubuntu 14.04 (and maybe 16.04??).
  2. Set up port forwarding so you can SSH into it (I've written up some quick VirtualBox/network port forwarding tips in a gist here - let me know if you need more help).
  3. Also try Ubuntu Server 18.04 - that should work. I haven't tried 16.04 or other distros yet. With the set-ups that work, emojis will also display inside tmux (both ssh and mosh) but again, I don't think we're dealing with a tmux issue here when bash under Mosh isn't displaying the emoji types of unicode either.

What I haven't tried:

Please let me know if this gets us any closer to where the problem might be.

Edit 20180711: As per some of the other closed issues above, I only have glibc 2.17 on the server. I'm now considering a migration away from CentOS 7.5.1804 to sort this.

Edit 20180808: I've just completed a migration from CentOS 7.5 (glibc 2.17) to Debian 9.5 Stable (glibc 2.24) and am satisfied with the results. Also expecting to have something like glibc 2.27 with Debian 10 next year. Those who need or wish to remain with CentOS, hopefully version 8 isn't too far away.

mpolden commented 6 years ago

I'm having a similar issue with zero width spaces (U+200B). Printing U+200B typically causes some kind of display corruption.

Likely cause seems to be that my client and server disagrees about the width of this particular character (locale en_US.UTF-8): Server: wcwidth(0x200B) == 0 Client: wcwidth(0x200B) == 1

Server is Debian stable (stretch) and mosh 1.2.6, client is macOS 10.13.6 and mosh 1.3.2.

jshort commented 5 years ago

Same issue with an OMZ theme that displays a 'gear' character if you have background processes in your shell. Works fine with a raw ssh session but not with mosh.

jquast commented 4 years ago

@tombh regaring your "rabbit hole", I think you may be pleased to find my article, "Offering a solution for Terminal Wide Character issues" https://jeffquast.com/post/terminal_wcwidth_solution/

I have authored a demonstration CLI utility that is able to automatically detect the version of Unicode supported by the Terminal emulator, https://github.com/jquast/ucs-detect/ and a new release of python wcwidth library https://github.com/jquast/wcwidth that supports all versions of unicode by selection using the exported environment variable.

nferch commented 3 years ago

Apologies in advance for bumping this thread, have been affected by this issue and finally was able to identify mosh as the culprit. I have annoying text alignment issues similar to @mikaabra in https://github.com/mobile-shell/mosh/issues/361 (in a TUI email client, nonetheless!).

Am not a Unicode expert by any means, so cannot begin to fathom the complexity of a fix of the root cause, but curious if there's been any other workarounds?

I'm using on mosh 1.3.2 from Homebrew on OS X 11.4 "Big Sur", connecting to a Ubuntu 18.04 "Bionic Beaver" box using mosh 1.3.2 and libc6 2.27-3ubuntu1.4. Having trouble displaying the ⚾ character.

I wonder if upgrading to 20.04 "Focal Fossa" would help? That seems to be using glibc 2.31-0ubuntu9.2. Although it seems like the simple passing of time hasn't done much to fix this issue :/

Casandro commented 2 years ago

The bug also seems to exist with the "symbols for legacy computing" https://en.wikipedia.org/wiki/Symbols_for_Legacy_Computing Test setup: Server: Debian 10 (mosh 1.3.2, tmux 2.8) Client: Debian 11 (mosh 1.3.2) and Xubuntu 21.10 (mosh 1.3.2) When loading https://github.com/Casandro/teletext_ng/blob/main/tools/dump_tta_text_colour.c in vim the special mosaic characters are there via ssh, but just missing via mosh.

lifei commented 1 year ago

image image image image

I made some debug on MSYS2 mosh. I think something wrong with the convert in Cell class.

image image

I find some code may be related. image

I also do some research on there code. and I find something wrong. image image

lifei commented 1 year ago

image image

lifei commented 1 year ago

well i took ten hours try to find what's wrong. Then I find that the following code does not work right in msys2.

#define _XOPEN_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <string>

int main() {
    setlocale(LC_ALL, "zh_CN.UTF-8");
    wprintf(L"wcwidth of 0x269C is %d\n", wcwidth(0x269C));
    std::wstring in = L"📁💕😘😒🤦";
    wprintf(L"length of string in is %d\n", in.size());
    for (std::wstring::const_iterator i = in.begin(); i != in.end(); i++)
    {
        wprintf(L"wcwidth = %d\n", wcwidth(*i));
    }
    wprintf(L"%ls", in.c_str());
    return 0;
}

Result in msys2

wcwidth of 0x269C is 1
length of string in is 10
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
📁💕😘😒🤦

Result in debian or WSL

wcwidth of 0x269C is 1
length of string in is 5
wcwidth = 2
wcwidth = 2
wcwidth = 2
wcwidth = 2
wcwidth = 2
📁💕😘😒🤦

but there is no libc in msys2.

$ ldd a.exe
        ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7fff75af0000)
        KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7fff743a0000)
        KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7fff72f10000)
        ADVAPI32.DLL => /c/WINDOWS/System32/ADVAPI32.DLL (0x7fff75980000)
        msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7fff75350000)
        sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7fff74470000)
        RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7fff74f60000)
        msys-stdc++-6.dll => /usr/bin/msys-stdc++-6.dll (0x526840000)
        msys-gcc_s-seh-1.dll => /usr/bin/msys-gcc_s-seh-1.dll (0x5e8160000)
        msys-2.0.dll => /usr/bin/msys-2.0.dll (0x180040000)

I suggest that using another way to split string into cells would be bring a high compatibility.

lifei commented 1 year ago

ok. everyone. i have spent more than 100 hours to figure out the method to render emoji on mosh of msys2. i finally find a way. here is the snapshots. image image

lifei commented 1 year ago

here is the pr: https://github.com/mobile-shell/mosh/pull/1271

JRGonz commented 7 months ago

Ok so it isn't just me. I am noticing this as well and just spent forever trying to figure out what it was in the chain. I have artifacts all over the place when using mosh+tmux+iamb. I guess I can just write this off as a mosh issue?

Edit: Forgot to add that I'm seeing this same behavior in blackbox (terminal) on my Fedora desktop using Gnome. ssh on its own renders just fine but when mosh connects then I get artifacts all over the place where there are emoji (tend to see this when moving around in iamb, gomuks, weechat)

cbean commented 4 months ago

Same issue on gentoo, using UTF8 while using ohmyzsh agnoster theme with root, the thunder symbol is somehow not visible. ⚡

mosh-1.4.0