Open McDutchie opened 2 years ago
An alarm discipline can currently cause lexer/parser corruption if invoked during parse time, such as when in a PS2
prompt in the interactive shell. In the session below I have foo.alarm()
print comsub
every two seconds, then I enter : $(
plus RETURN to enter a PS2 prompt, so we're lexing/parsing another comsub. The behaviour is unpredictable. Depending on your system and prompt settings, the output can be like below or the shell may crash.
$ alarm -r foo +2
$ foo.alarm() { echo $(echo comsub); }
$ comsub
comsub
comsub
: $(
> -ksh: $(
�: invalid variable name
-ksh: $(
�: invalid variable name
-ksh: $(
�: invalid variable name
)
$ comsub
comsub
unset foo
$
David Korn's commentary was: https://github.com/ksh93/ksh/blob/e373e8c1b5192fa1f1ff5e40590c85d215080f80/src/cmd/ksh93/bltins/alarm.c#L29-L37 Now that I've learned from our previous fixes to get/set disciplines, I think this suggested approach is not necessary; I think we can fix this in a similar way. First we need to figure out just where the alarm discipline is called from.
If we add a handy little __abort__
built-in to intentionally crash the shell and generate a backtrace:
--- a/src/cmd/ksh93/data/builtins.c
+++ b/src/cmd/ksh93/data/builtins.c
@@ -153,6 +153,7 @@ const struct shtable3 shtab_builtins[] =
#if SHOPT_REGRESS
"__regress__", NV_BLTIN|BLT_ENV, bltin(__regress__),
#endif
+ "__abort__", NV_BLTIN, (int(*)(int,char**,Shbltin_t*))abort,
"", 0, 0
};
…then recompile and use it to generate a crash:
$ ENV=/./dev/null arch/darwin.i386-64/bin/ksh -o emacs
$ alarm -r foo +2
$ foo.alarm() { __abort__; }
$ Abort
…the backtrace gives us the function call stack that tells us where the foo.alarm
discipline was called from:
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff6014fbf6 __kill + 10
1 ksh 0x000000010e937424 sh_done + 836 (fault.c:679)
2 ksh 0x000000010e9368c5 sh_fault + 1365
3 libsystem_platform.dylib 0x00007fff601ffb5d _sigtramp + 29
4 ??? 0x0000002000000020 0 + 137438953504
5 libsystem_c.dylib 0x00007fff600b96a6 abort + 127
6 ksh 0x000000010e98e697 sh_exec + 8231 (xec.c:1367)
7 ksh 0x000000010e901bc0 b_dot_cmd + 1712 (misc.c:318)
8 ksh 0x000000010e994b46 sh_funct + 582 (xec.c:3307)
9 ksh 0x000000010e998541 sh_fun + 1089 (xec.c:3398)
10 ksh 0x000000010e8fc239 sh_timetraps + 153 (alarm.c:153)
11 ksh 0x000000010e8fcc05 trap_timeout + 149 (alarm.c:136)
12 ksh 0x000000010e98a594 sigalrm + 1140 (timers.c:166)
13 libsystem_platform.dylib 0x00007fff601ffb5d _sigtramp + 29
14 ??? 000000000000000000 0 + 0
15 ksh 0x000000010e9333c9 ed_read + 1161 (edit.c:946)
16 ksh 0x000000010e93372c ed_getchar + 124 (edit.c:1101)
17 ksh 0x000000010e9996e3 ed_emacsread + 1235 (emacs.c:280)
18 ksh 0x000000010e943023 slowread + 467 (io.c:1977)
19 ksh 0x000000010ea2bfb5 sfrd + 1349 (sfrd.c:242)
20 ksh 0x000000010ea24649 _sffilbuf + 1113 (sffilbuf.c:101)
21 ksh 0x000000010ea2d3a9 sfreserve + 1673 (sfreserve.c:144)
22 ksh 0x000000010e917785 exfile + 2229 (main.c:527)
23 ksh 0x000000010e916b82 sh_main + 3426 (main.c:352)
24 ksh 0x000000010e8fc196 main + 38 (pmain.c:45)
25 libdyld.dylib 0x00007fff600143d5 start + 1
sh_fun()
(at 9) is for calling a shell function. So it looks like sh_timetraps()
(at 10) is the function that'll need a fix similar to the ones we applied to the get and set disciplines.
The following should go a ways towards robustifying the alarm discipline. It fixes the lexer corruption issue at least, and also stops alarm disciplines from influencing $?
. It just copies the fixes we applied to lookup() in nvdisc.c which implements the get discipline. (The Mamfile needed quite a bit of reordering to include shlex.h and its dependencies in alarm.c, that's not very interesting so ignore that and look at the changes in alarm.c.)
I tried compiling the above patch with tcc and it failed because FEATURE/locale
also needs to be reordered. Here's an improved patch:
I'll now experiment with using alarm
in my kshrc.
I've found a looping bug in the alarm builtin caused by launching, then exiting external shells. Reproducer (somewhat inconsistent, triggers the looping bug about a third of the time):
$ cat /tmp/env
typeset -A format=(
[bell]=$'\a'
[start_title]=$'\E]0;'
)
# Set the window title using the alarm builtin
alarm -r set_window_title +1
function set_window_title.alarm {
printf "${format[start_title]}%(%I:%M:%S %p)T - ${ id -un ;}: ${ hostname ;}${format[bell]}"
}
$ ENV=/tmp/env arch/*/bin/ksh # Run with lexer fix patch
$ zsh
% ksh
$ ^D
% ^D
-4 $ # Prints a negative number, then loops while printing '$ \n'
Here is another patch. It also saves the state of the stack, saves and turns off all the shell state flags, and saves errno. I now cannot trigger the looping bug above. – actually, I can. :(
At this point I wonder if we can ever manage to fix this thing. David Korn didn't manage, but kept shipping it for years anyway; my standards are higher than that. I've removed it from the 1.0 branch in preparation for a release.
I'm starting to doubt it should even stay on the dev branch. Even if we can fix it, is it worth the time and effort? The idea of it is interesting but not that interesting. There are other ways of accomplishing what it is intended to do.
An email from a user made me look at this again.
The code has changed significantly, invalidating the patch above, so here's an updated version.
This version also adds adds a siglongjmp
call since that should always be used if sigsetjmp
is used.
Between that, and the fix for sh_fun
from d446015efce83f7cd17da830df41f8d27f10ab07, I wonder if this now actually fixes the alarm builtin! I cannot reproduce that looping bug now.
Please test…
edit (2024-07-08): patch updated to apply to current
Nope, there are still problems. With the /tmp/env script from above...
$ ENV=/tmp/env arch/darwin.arm64-64/bin/ksh
$ sleep 5 & wait
[1] 39838
$ <=== prompt after 1 second, not 5.
[1] + Done sleep 5 & wait
$
$ set -b <=== corrupted output, see below
$ sleep 5 & wait
[1] 39889
$
0;06:34:03 pm - martijn
$
Nope, there are still problems.
I have tested the latest patch and so far cannot reproduce any of those bugs on my Linux system, nor can I reproduce the looping bug anymore. The patch works quite well for me, so I'm not sure what's causing those problems.
Update: I have tested the patch again, and have managed to reproduce the bug that causes wait
to fail. It is a regression that was introduced by the changes to jobs.c in commit 667034ff (i.e., not a bug located directly in the alarm
builtin).
edit: Patch which reverts the breaking change:
diff --git a/src/cmd/ksh93/sh/jobs.c b/src/cmd/ksh93/sh/jobs.c
index a77a06803..2deb16fc7 100644
--- a/src/cmd/ksh93/sh/jobs.c
+++ b/src/cmd/ksh93/sh/jobs.c
@@ -275,7 +275,7 @@ int job_reap(int sig)
struct process *px;
int flags;
struct jobsave *jp;
- int nochild = 0, oerrno = errno, wstat;
+ int nochild = 0, oerrno = errno, wstat, was_ttywait_on;
Waitevent_f waitevent = sh.waitevent;
static int wcontinued = WCONTINUED;
if (vmbusy())
@@ -297,12 +297,16 @@ int job_reap(int sig)
sh.waitevent = 0;
while(1)
{
+ was_ttywait_on = sh_isstate(SH_TTYWAIT); /* save tty wait state */
if(!(flags&WNOHANG) && !sh.intrap && job.pwlist)
{
+ sh_onstate(SH_TTYWAIT);
if(waitevent && (*waitevent)(-1,-1L,0))
flags |= WNOHANG;
}
pid = waitpid((pid_t)-1,&wstat,flags);
+ if(!was_ttywait_on)
+ sh_offstate(SH_TTYWAIT);
/*
* some systems (Linux 2.6) may return EINVAL
After applying this patch the bug in wait
disappears.
Thanks for isolating that. Interesting. I have no idea why it fixes that, but I need to find out before committing it.
Even with the patch I'm still getting the infinite prompt looping bug, triggered seemingly randomly.
See:
It's clearly a wanted feature that had quite a few people very curious over the years. I think the problems with its stability are probably very similar to the general problems with discipline functions that we fixed in 430e47813cacf7366ea4ce03d170b254447c331c and 2322f939429ae002f92ea333a3bb6b149aad1431. I think we have the knowledge to fix it now; the lexer state needs to be saved and restored, and perhaps another sh_{push,pop}context is necessary. Even if that doesn't fix everything, this is never going to be fixed if it remains undocumented and unused.