tcsh-org / tcsh

This is a read-only mirror of the tcsh code repository.
https://www.tcsh.org/
Other
237 stars 42 forks source link

SIGSEGV on invalid UTF-8 input #7

Closed suominen closed 2 years ago

suominen commented 5 years ago

I recently switched from ISO-8859-1 (a single byte character set) to UTF-8 (a multibyte character set) and ran into a problem with tcsh dying with SIGSEGV on occasion. This happens both on Debian Linux and on NetBSD. (The tcsh shipped with macOS doesn't crash.)

I realised this happens when all of the following are true:

  1. My keyboard is switched to a Finnish layout.
  2. I'm using screen.
  3. I'm pressing the command key followed by a key for a multibyte character.
  4. The window appears stuck so I press some more keys. On NetBSD exactly 4 more bytes of input are needed.

For example, press C-a followed by and four spaces.

What happens is that screen eats the first byte of the euro sign character (\342) and the remaining two go to tcsh (\202 and \254). Those two bytes by themselves or combined with the spaces do not form a valid UTF-8 character.

Compiling tcsh with "-g" (i.e. without the "-O2" optimisation that configure puts in) still has this problem.

Starting program: /home/kim/src/tcsh/tcsh
equinoxe:~/src/tcsh>
Program received signal SIGSEGV, Segmentation fault.
0x00000000004464ab in GetNextCommand (cmdnum=0x7f7fff41d81b "", ch=0x7f7fff41d814 L"\xf0000082\x797400")
    at ed.inputl.c:699
699                 cmd = CurrentKeyMap[*ch];
(gdb) where
#0  0x00000000004464ab in GetNextCommand (cmdnum=0x7f7fff41d81b "", ch=0x7f7fff41d814 L"\xf0000082\x797400")
    at ed.inputl.c:699
#1  0x000000000044528c in Inputl () at ed.inputl.c:174
#2  0x0000000000420b69 in bgetc () at sh.lex.c:1662
#3  0x000000000042042a in readc (wanteof=0) at sh.lex.c:1423
#4  0x000000000041d4a4 in lex (hp=0x670870 <paraml>) at sh.lex.c:157
#5  0x0000000000406c46 in process (catch=1) at sh.c:2059
#6  0x0000000000405938 in main (argc=0, argv=0x7f7fff41dc78) at sh.c:1423
zoulasc commented 5 years ago

Is this still a problem?

suominen commented 5 years ago

The segmentation fault no longer happens since commit 1a9cf9aae4674b93f163a81ffab5457299fb10e1.

However, tcsh will still silently discard 4 additional bytes from input. This can be confusing to the user. In comparison, bash and zsh output two invalid character symbols. This seems better, as the shell won't appear "stuck" to the user.

suominen commented 2 years ago

No SIGSEGV, and the "stuck waiting for 4 more bytes" should probably be a separate issue, so closing this.