huashengdun / webssh

:seedling: Web based ssh client
https://webssh.huashengdun.org/
MIT License
4.47k stars 1.25k forks source link

Terminal problems ($TERM, utf-8) #84

Open xnoreq opened 5 years ago

xnoreq commented 5 years ago

On the server (connected directly through ssh using xterm as terminal):

$ locale
LANG=en_US.UTF-8

$ echo $TERM
xterm-256-color

$ echo -e '\xe2\x82\xac'
€

Running webssh-1.4.5 like so:

$ wssh --port=7681

Now after connecting through the browser to the same server:

echo $TERM
xterm

$ echo -e '\xe2\x82\xac' 
€

Why not xterm-256color and why is the encoding broken?

huashengdun commented 5 years ago
  1. About the terminal type That's because webssh creates a pseudo tty with hardcoded terminal type xterm for every ssh connection.

  2. About the encoding problem Probably your browser doesn't use UTF-8 as the decoding type. You can check the browser console to see what encoding it uses.

xnoreq commented 5 years ago
  1. Why?
  2. Yes, Chromium supports UTF-8. It's the most used browser in the world and the majority of websites use UTF-8.
huashengdun commented 5 years ago
  1. Because xterm-256color is less commonly supported than xtem.
  2. You may take a look at https://github.com/huashengdun/webssh#browser-console section which describes how to deal with encoding.
xnoreq commented 5 years ago
  1. Why not make it configurable? xterm.js supports xterm-256color.

  2. In my web browser I can see this in the log: The deault encoding of your server is ANSI_X3.4-1968

This makes no sense since on the server the default locale is configured correctly in /etc/locale.conf: LANG=en_US.UTF-8 which is loaded by /etc/profile.d/locale.sh.

With every other client this works correctly and after login I get:

$ locale
LANG=en_US.UTF-8

So it looks like the default encoding detection does not work or does something non-standard that is not compatible.

huashengdun commented 5 years ago

What kind of server you are using? What is the output of command locale charmap ?

xnoreq commented 5 years ago

GNU/Linux, kernel version 5.2

$ locale charmap
UTF-8
huashengdun commented 5 years ago

That is weird. webssh uses the command locale charmap to detect the default encoding of the server being connected. If the output of this command is UTF-8, then the log in your browser console should look like

The deault encoding of your server is UTF-8
huashengdun commented 5 years ago

Terminal type is configurable now. You can pass a terminal type via url.

http://localhost:8888/?term=xterm-256color
huashengdun commented 5 years ago
1. Why not make it configurable? xterm.js supports `xterm-256color`.

2. In my web browser I can see this in the log:
   `The deault encoding of your server is ANSI_X3.4-1968`

This makes no sense since on the server the default locale is configured correctly in /etc/locale.conf: LANG=en_US.UTF-8 which is loaded by /etc/profile.d/locale.sh.

With every other client this works correctly and after login I get:

$ locale
LANG=en_US.UTF-8

So it looks like the default encoding detection does not work or does something non-standard that is not compatible.

Here is my locale:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I guess probably your locale is not configured correctly.

xnoreq commented 5 years ago

No, it's the same on my system, but above I just pasted the first line.

The full output:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

$ locale -a
C
en_US.utf8
POSIX

$ locale -m
ANSI_X3.110-1983
ANSI_X3.4-1968
ARMSCII-8
ASMO_449
BIG5
BIG5-HKSCS
BRF
BS_4730
BS_VIEWDATA
CP10007
CP1125
<snip>
T.61-8BIT
TCVN5712-1
TIS-620
TSCII
UTF-8
VIDEOTEX-SUPPL
VISCII
WIN-SAMI-2
WINDOWS-31J

$ locale -c charmap
LC_CTYPE
UTF-8

$ locale charmap
UTF-8
huashengdun commented 5 years ago

If the log in you browser console is The deault encoding of your server is ANSI_X3.4-1968, then the output of the command locale charmap on your server should be ANSI_X3.4-1968.

xnoreq commented 5 years ago

But it is UTF-8. I have even started webssh on the same user as in all the commands above.

huashengdun commented 5 years ago

Can you show me the whole log in your browser console when you connect to this server?

xnoreq commented 5 years ago

I've added some debug messages to handler.py and it looks like the environment is not loaded correctly. At the time locale charmap is executed env returns very few environment variables and LANG is missing.

xnoreq commented 5 years ago

The problem is that locale charmap is sent as direct command through SSH and on my system this means the executing shell is a non-interactive and doesn't read /etc/profile.

Even if LANG was set, this way of detecting the encoding is wrong anyway, as it requires knowing the charset to decode the answer... the answer that you need to know for decoding in the first place.

This is why terminals let the user configure the encoding, which is the correct way to do it, with the default being UTF-8 on pretty much any modern terminal.

huashengdun commented 5 years ago

Have you tested it on other systems? Here is a related issue, https://github.com/huashengdun/webssh/issues/21. Also can you run this command python -c "import sys; print(sys.stdout.encoding)" on your special server?

huashengdun commented 4 years ago

Screenshot from 2019-09-22 21-33-40 Tested on ubuntu 19 with latest kernel 5.3, the default encoding detection works.

xnoreq commented 4 years ago

There's no reason to test on other systems, as I've pointed out what's going on.

I dug a bit deeper though: on Debian, bash is not only patched to detect that it runs non-interactively under ssh and therefore executes bashrc (which doesn't happen in a "normally" compiled bash and doesn't necessarily set LANG anyway), in Debian the system's LANG is also "injected" into ssh shells through PAM regardless if they're (non-)interactive or (non-)login shells.

Neither is necessarily true on non-Debian or related systems.

Also, as I've explained, the way you try to detect the encoding is wrong anyway. You get an encoded response that contains the encoding needed to decode it in the first place. Since you don't know the encoding you just fall back to UTF-8 anyway, which is behavior that will break on non-UTF-8 systems.

--

Why don't any of these problems happen with normal ssh? Because ssh and sshd, if configured that way (and are again by default on Debian), will send the client's environment variables (like LANG) to the server which accepts them. See SendEnv/AcceptEnv in ssh(d)_config.

But that again is not a given on all systems, and not always desired anyway. In this case, the client has to set the LANG for the command itself like so: LANG=en_US.UTF-8 command.

This is also how you can properly query for available locales: LANG=C locale -a because now you'll get an answer that is encoded in a known encoding: ASCII in this case. ANSI_X3.4-1968 to be precise.

huashengdun commented 4 years ago

You get an encoded response that contains the encoding needed to decode it in the first place. Since you don't know the encoding you just fall back to UTF-8 anyway, which is behavior that will >break on non-UTF-8 systems.```

This is because I know the output of locale charmap only contains ascii characters. For ascii characters, enconding with different encodings will get the same bytes. And decoding the result bytes with different encodings will get the same string.

Also can you tell me what kind of system(what flavour and what edition) do you use?

huashengdun commented 4 years ago

Screenshot from 2019-09-23 09-21-23 Just tested on centos 7, the default encoding detection also works. Until now I have tested two kinds of Linux flavour (Debian and Redhat) and they all work.

nirui commented 4 years ago

Well, I'm researching the same problem (SSH Shell Encoding) which brought me here (Well, actually, Google brought me here, but anyway

Based on the information that I grabbed from this issue, I think maybe you can try to run locale charmap within xtermjs console rather than directly on server(?). At least xtermjs console is interactive so you should be able to get the correct result there.

But as @xnoreq has suggested, that's NOT how it should be done. Maybe you need to provide a method to allow user to configure the encoding by themselves. I know I will be doing that after reading all comments here, so yeah, I recommend it :)

Also ....

Here is a related issue, #21.

I don't think this two is related. The Issue #21 is caused by unsupported encoding label.

The TextDecoder only supports encoding from this list, and en_IN is not on the list.

You cannot simply feed the output from a SSH command directly to a JavaScript function and expect everything will work just right. Maybe do a mapping?

Hope it helps :)

huashengdun commented 4 years ago

Maybe you need to provide a method to allow user to configure the encoding by themselves.

Already provided, you can configure an encoding in your url. http://localhost:8888/#encoding=gbk

huashengdun commented 4 years ago

Well, I'm researching the same problem (SSH Shell Encoding) which brought me here

I just searched Google with "SSH Shell Encoding", I don't see any result related. Can you show me some links which are related to this issue?
Also can you tell me what kind of server(flavour and edition) you run on which you met the same problem?

nirui commented 4 years ago

Oh, the keyword was 'ssh encoding "locale charmap"'.

I was trying to figure out whether or not it's a good idea to send locale programmatically to server in order to detect it's encoding, and found out it isn't. Just here to share my findings, sorry if I bothered you.

huashengdun commented 4 years ago

OK so which dicussion tells you that running command locale charmap is not a good way to detect the encoding? Actually I never expect that command locale charmap can work on all platforms. At least I have tested on Linux systems of Debian and Redhat flavour and they all work. Can you tell us what server you run on which you meet this problem? So that everyone can test it.

huashengdun commented 4 years ago

The TextDecoder only supports encoding from this list, and en_IN is not on the list. You cannot simply feed the output from a SSH command directly to a JavaScript function and expect everything will work just right. Maybe do a mapping?

      new TextDecoder('en_IN')

This line code will blow up if the encoding is not a valid one. Seems you don't even read my JavaScript code, how could you comment like this?

huashengdun commented 4 years ago

Oh, sorry just deleted your comment by accident. Here is your comment copied from my email.

First, let me clarify this: I'm not a user of your software. I'm researching this topic, not your software. I come here because that Google search, and I've confirmed what I expected, so I thought maybe I should share some of mine findings as well.

The thing is this, based on the small portion of the SSH specs I have read, as far as I can tell, unlike Telnet, it does not provide any method for the two parties to negotiate charset encoding. To me, it implies that user have to setup that encoding by themselves before connection is made.

Hope this could resolve some confusion created by me :)

nirui commented 4 years ago

Oh, sorry just deleted your comment by accident.

No problem :)

huashengdun commented 4 years ago

First, let me clarify this: I'm not a user of your software. I'm researching this topic, not your software. I come here because that Google search, and I've confirmed what I expected, so I thought maybe I should share some of mine findings as well.

The thing is this, based on the small portion of the SSH specs I have read, as far as I can tell, unlike Telnet, it does not provide any method for the two parties to negotiate charset encoding. To me, it implies that user have to setup that encoding by themselves before connection is made.

Hope this could resolve some confusion created by me :)

Like I said before, I never expect that command locale charmap can work on all platforms. At least I have tested on Linux systems (Debian and Redhat) and they all work. But thanks for your suggestion.

Also please provide me with the links of your findings and the links of the small portion of the SSH specs you have read.

huashengdun commented 4 years ago

Created a simple Python script to get the default encoding of your ssh server for anyone would meet this problem in the future. https://gist.github.com/huashengdun/0af95bdafdce46a6ecbfc628dcd07c29

  1. Make sure the locale of your server is configured properly. https://help.ubuntu.com/community/Locale
  2. Run this script https://gist.github.com/huashengdun/0af95bdafdce46a6ecbfc628dcd07c29 on your local computer to fetch the default encoding of your server.
  3. Login your server then run command locale charmap.
  4. Compare the results of step 2 and step 3.

If these two results are different, please report the information (flavour and edition) of your server here.

xnoreq commented 4 years ago

I was a user but since the author apparently doesn't read I have moved on to a better solution. I've explained everything relevant in my comment https://github.com/huashengdun/webssh/issues/84#issuecomment-533901135. Thanks and bye.

huashengdun commented 4 years ago

I was a user but since the author apparently doesn't read I have moved on to a better solution. I've explained everything relevant in my comment #84 (comment). Thanks and bye.

I did read your comment. I think your explanation is reasonable. But it is just a theory and it may be outdated. I have already tested on two kinds of Linux systems (Debian and Redhat) and the current encoding detection works well.

And you still have not provided me with the detailed information of your server. I only know your server is kinda of GNU/Linux, kernel version 5.2. In that way I cannot reproduce the error as you described.

xnoreq commented 4 years ago

This is my last response, because I've spent enough time on this.

But it is just a theory and it may be outdated.

You gotta be kidding. Everything is factual and current information.

I have already tested on two kinds of Linux systems (Debian and Redhat)

In other words, you either didn't read or understand my comment.

In that way I cannot reproduce the error as you described.

Actually, I have given all the information (PAM env setting LANG, bash with SSH_SOURCE_BASHRC) that is needed to reproduce the error. On top of this, I have explained multiple times why how you're detecting server encoding is simply logically wrong/contradictory.

And that's why I'm unsubbing, sorry.

huashengdun commented 4 years ago

Thanks for your discussion and your time.

You are a very funny guy. You just tell me an explanation that why my current encoding detection method is wrong. But you don't tell me the information of the actual server you run on which this problem occurred. Are you working for CIA that you run a system of which the information cannot be uncovered?

Actually I have already tested on two kinds of Linux systems (Debian and Redhat) and the current encoding detection works well. Those two results contradict your theory.

huashengdun commented 4 years ago

On top of this, I have explained multiple times why how you're detecting server encoding is simply logically wrong/contradictory.

For pure ascii characters, There is no difference between bytes.decode('utf-8') and bytes.decode('ascii') Did see my explanation?

huashengdun commented 4 years ago

Actually, I have given all the information (PAM env setting LANG, bash with SSH_SOURCE_BASHRC) that is needed to reproduce the error.

My app works with the real OSes not a pure theoretical environment built base on those information you gave me. How could I know the real OS works just like the way you describe? As a fact, Linux systems of Debian and Redhat I've already tested(limited editions) don't work in that way.

huashengdun commented 4 years ago

Tested it on FreeBSD, current encoding detection still works. freebsd-utf8

After changing charset to GBK, also works. freebsd-gbk

But it failed on MacOS. Seems It don't send env LANG back. But I don't care cause almost no body uses it as a server.

huashengdun commented 4 years ago

Until now I've tested several different systems including Debian, Ubuntu, Centos, FreeBSD, MacOS. The encoding problem brought in the issue only happens on MacOS (tested on macOS v10.13.6). The encoding detection method currently being used works on all the other systems listed above.

I am going to close this issue now.

If you meet this encoding problem that the app can't detect the default encoding of your server, you can simply pass an encoding via the url like this:

http://localhost:8888/?encoding=utf-8
huashengdun commented 4 years ago

OK, I notice that the current encoding detection method detects the system-wide character encoding, not the encoding of user level configuration that the user prefers to use.

Seems correct as the Features section says "Auto detect the ssh server's default encoding".

huashengdun commented 4 years ago

The code have been updated. Now I am using two commands to try to grab the encoding set by the user.

ssh -t <user>@<host> '$SHELL -ilc "locale charmap"'

This command seems work on Debian, Ubuntu, CentOS, MacOS, .

ssh -t <user>@<host> '$SHELL -ic "locale charmap"'

This command is for FreeBSD. The default shell used by FreeBSD has no login option.

huashengdun commented 4 years ago

Hi xnoreq,

Sorry for my carelessness. I should read your comments more carefully.

Please subscribe this issue. Hope you can see this comment and test your server with my updated solution.