ksh93 / ksh

ksh 93u+m: KornShell lives! | Latest release: https://github.com/ksh93/ksh/releases
Eclipse Public License 2.0
189 stars 31 forks source link

.sh.tilde discipline is too brittle #346

Closed McDutchie closed 2 years ago

McDutchie commented 2 years ago

Defining a .sh.tilde.get or .sh.tilde.set discipline function to extend tilde expansion works well as long as the discipline function doesn't get interrupted (e.g. with Crtl+C) or produce an error message. Either of those will cause the shell to become unstable and crash.

To reproduce: To trigger an interrupt, adding 'sleep 1' to the discipline function makes it easy to press Ctrl+C. To trigger an error, give an invalid option to a special built-in, e.g. trap -Q.

I have experimented with two different techniques to fix this, with no success at all so far. sh_pushcontext is getting me nowhere and running the command via sh_trap doesn't seem to help either.

A workaround for the interruption crash is to make it uninterruptible with sigblock(SIGINT) in tilde_expand2() before running the discipline and sigrelease(SIGINT) after. But that does nothing to fix the crash on error.

This feature should probably be removed from the 1.0 branch as it is not ready for prime time. It can return to a release branch if/when we manage to fix it on the master branch.

JohnoKing commented 2 years ago

Perhaps setjmp can be used to fix the error handling segfault. Solaris patch 330-30841535.patch claims to fix a coredump caused by improper error handling. Repurposing the idea from that patch for .sh.tilde seems to fix the memory faults from inserting trap -Q in the discipline function (although it does nothing for SIGINT). Patch for testing (causes no regression test fails on my end):

--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2638,6 +2638,7 @@ static void tilde_expand2(Shell_t *shp, register int offset)
    {
        stakfreeze(1);              /* terminate current stack object to avoid data corruption */
        block++;
+       setjmp(*shp->jmplist);
        nv_putval(SH_TILDENOD, &stakp[offset], 0);
        cp = nv_getval(SH_TILDENOD);
        block--;
# Results after applying the patch above
$ arch/*/bin/ksh
$ echo ~
arch/linux.i386-64/bin/ksh[44]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
/home/johno
ksh $ echo ~ # Press tab
arch/linux.i386-64/bin/ksh[44]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
/home/johno 
/home/johno
$ echo ~ksh
arch/linux.i386-64/bin/ksh[44]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
~ksh

Edit: See also https://github.com/ksh93/ksh/issues/347#issuecomment-979456879, which demonstrates this bug likely applies to all discipline functions.

JohnoKing commented 2 years ago

As expected, the discipline function for PS1 is also affected:

$ PS1.get() { trap -Q; }
/usr/bin/ksh: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
/usr/bin/ksh: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1

# Freezes

Also as expected, setjmp avoids that lockup (although setjmp before PS1.get is handled causes a segfault when the shell receives SIGINT).

JohnoKing commented 2 years ago

After yet more experimentation I've come up with this patch (edit: patch updated to avoid using shp):

--- a/src/cmd/ksh93/sh/nvdisc.c
+++ b/src/cmd/ksh93/sh/nvdisc.c
@@ -283,11 +283,16 @@ static void   assign(Namval_t *np,const char* val,int flags,Namfun_t *handle)
        nq =  vp->disc[type=UNASSIGN];
    if(nq && !isblocked(bp,type))
    {
-       int bflag=0, savexit=sh.savexit;
+       int bflag=0, savexit=sh.savexit, jmpval=0;
+       struct checkpt buff;
        block(bp,type);
        if (type==APPEND && (bflag= !isblocked(bp,LOOKUPS)))
            block(bp,LOOKUPS);
-       sh_fun(nq,np,(char**)0);
+       sh_pushcontext(&sh,&buff,1);
+       jmpval = sigsetjmp(buff.buff,0);
+       if(!jmpval)
+           sh_fun(nq,np,(char**)0);
+       sh_popcontext(&sh,&buff);
        unblock(bp,type);
        if(bflag)
            unblock(bp,LOOKUPS);
@@ -376,7 +381,8 @@ static char*    lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
    union Value     *up = np->nvalue.up;
    if(nq && !isblocked(bp,type))
    {
-       int     savexit = sh.savexit;
+       int     savexit = sh.savexit, jmpval = 0;
+       struct checkpt  buff;
        node = *SH_VALNOD;
        if(!nv_isnull(SH_VALNOD))
        {
@@ -389,7 +395,11 @@ static char*   lookup(Namval_t *np, int type, Sfdouble_t *dp,Namfun_t *handle)
            nv_setsize(SH_VALNOD,10);
        }
        block(bp,type);
-       sh_fun(nq,np,(char**)0);
+       sh_pushcontext(&sh,&buff,1);
+       jmpval = sigsetjmp(buff.buff,0);
+       if(!jmpval)
+           sh_fun(nq,np,(char**)0);
+       sh_popcontext(&sh,&buff);
        unblock(bp,type);
        if(!vp->disc[type])
            chktfree(np,vp);

It makes ksh do a longjmp back to the correct context after a discipline function exits with an error. The .sh.tilde reproducer now works correctly. Despite that the PS2.get discipline function can still crash, although it now crashes in the dcl_dehacktivate function added in commit 5d353210:

$ test \
/home/johno/GitRepos/KornShell/ksh/arch/linux.i386-64/bin/ksh[138]: trap: -Q: unknown option
Usage: trap [-p] [action condition ...]
 Help: trap [ --help | --man ] 2>&1
> true

Program received signal SIGABRT, Aborted.
0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7c95862 in abort () from /usr/lib/libc.so.6
#2  0x00005555555b20e3 in dcl_dehacktivate () at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:278
#3  0x00005555555b2ebb in sh_cmd (lexp=0x5555556fdba0, sym=10, flag=132) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:591
#4  0x00005555555b2950 in sh_parse (shp=0x5555556f5580 <sh>, iop=0x5555556f3a80 <_Sfstdin>, flag=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:440
#5  0x00005555555684f0 in exfile (shp=0x2, iop=0x7fffffffdd40, fno=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:571
#6  0x0000555555567aee in sh_main (ac=1, av=0x7fffffffe468, userinit=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:361
#7  0x0000555555566e1e in main (argc=1, argv=0x7fffffffe468) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:46
(gdb) frame 2
#2  0x00005555555b20e3 in dcl_dehacktivate () at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:278
278         abort();
(gdb) frame 3
#3  0x00005555555b2ebb in sh_cmd (lexp=0x5555556fdba0, sym=10, flag=132) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:591
591     dcl_dehacktivate();
(gdb) frame 4
#4  0x00005555555b2950 in sh_parse (shp=0x5555556f5580 <sh>, iop=0x5555556f3a80 <_Sfstdin>, flag=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:440
440     t = sh_cmd(lexp,(flag&SH_EOF)?EOFSYM:'\n',SH_SEMI|SH_EMPTY|(flag&SH_NL))
McDutchie commented 2 years ago

Does this fix the crash in dcl_dehacktivate()?

--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -284,6 +284,7 @@ static void dcl_dehacktivate(void)
 }
 static noreturn void dcl_exit(int e)
 {
+   dcl_recursion = 1;
    dcl_dehacktivate();
    (*error_info.exit)(e);
    UNREACHABLE();
McDutchie commented 2 years ago

BTW, side note: let's not bother with introducing more pointless shp pointers as I'm going to remove them all at some point anyway. No matter what overly clever thing AT&T might have been planning, we're never going to have multiple shell states at the same time. Where something demands to be passed a pointer to the shell state, you can use &sh instead. (Until recently that threw a syntax error in the sh_pushcontext and sh_popcontext macros but I fixed that in 802e8db75206d79ce82a376a9edd6274a556cb23.)

JohnoKing commented 2 years ago

Does this fix the crash in dcl_dehacktivate()?

Nope, it still crashes.

Program received signal SIGABRT, Aborted.
0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
(gdb) p dcl_recursion
$1 = 0
(gdb) p dcl_tree
$2 = (Dt_t *) 0x55555571edc0
(gdb) bt
#0  0x00007ffff7cabd22 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7c95862 in abort () from /usr/lib/libc.so.6
#2  0x00005555555b20e3 in dcl_dehacktivate () at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:278
#3  0x00005555555b2ec5 in sh_cmd (lexp=0x5555556fdb40, sym=10, flag=132) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:592
#4  0x00005555555b295a in sh_parse (shp=0x5555556f5580 <sh>, iop=0x5555556f3a80 <_Sfstdin>, flag=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:441
#5  0x00005555555684f0 in exfile (shp=0x2, iop=0x7fffffffdda0, fno=0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:571
#6  0x0000555555567aee in sh_main (ac=1, av=0x7fffffffe4c8, userinit=0x0) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:361
#7  0x0000555555566e1e in main (argc=1, argv=0x7fffffffe4c8) at /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:46

BTW, side note: let's not bother with introducing more pointless shp pointers as I'm going to remove them all at some point anyway.

I've updated the patch to avoid using shp pointers.

McDutchie commented 2 years ago

I think I've made sense of why that crash occurs. The PS2 discipline is run while trying to parse a command. So it's another context where arbitrary code might be executed at parse time, which messes up that hack. Aaaargh. Apparently at AT&T they'd never heard about the idea of having a clean separation between distinct subsystems! Anyway, the backtrace confirms that hypothesis, as it includes sh_parse(). It'll need a fix similar to this one.

edit: But that'll have to be for later, I simply don't have time for that right now. I've reverted that commit until it can be fixed.