Closed gisburn closed 4 months ago
Note the missing "))" in "if (( (stop_i - start_i) > 100 ; then"
Thanks for the report. I can reproduce the crash on macOS as well. ksh 93u+ 2012-08-01 and ksh2020 also crash.
The crash happens in the parser:
$ arch/darwin.arm64-64/bin/ksh issue764.sh
AddressSanitizer:DEADLYSIGNAL
=================================================================
==10855==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x000102ae7a58 bp 0x00016d426470 sp 0x00016d4262e0 T0)
==10855==The signal is caused by a UNKNOWN memory access.
==10855==Hint: address points to the zero page.
#0 0x102ae7a58 in getanode parse.c:338
#1 0x102ae97a0 in item parse.c:1206
#2 0x102ae8da4 in term parse.c:644
#3 0x102ae8584 in list parse.c:615
#4 0x102ae6d40 in sh_cmd parse.c:563
#5 0x102ae9d78 in item parse.c:1240
#6 0x102ae8da4 in term parse.c:644
#7 0x102ae8584 in list parse.c:615
#8 0x102ae6d40 in sh_cmd parse.c:563
#9 0x102ae700c in sh_cmd parse.c:588
#10 0x102ae700c in sh_cmd parse.c:588
#11 0x102ae700c in sh_cmd parse.c:588
#12 0x102ae700c in sh_cmd parse.c:588
#13 0x102ae700c in sh_cmd parse.c:588
#14 0x102aeae68 in item parse.c:1339
#15 0x102af2518 in funct parse.c:923
#16 0x102aea9b4 in item parse.c:1308
#17 0x102ae8da4 in term parse.c:644
#18 0x102ae8584 in list parse.c:615
#19 0x102ae6d40 in sh_cmd parse.c:563
#20 0x102ae6688 in sh_parse parse.c:440
#21 0x102a8d9d0 in exfile main.c:594
#22 0x102a8a70c in sh_main main.c:366
#23 0x1029d56ec in main pmain.c:42
#24 0x103181088 in start+0x204 (dyld:arm64e+0x5088)
Minimal reproducer necessary to reproduce the crash (determined by systematic elimination of code in the reproducer reported above):
{
(( 1
$(( 1 ))
}
The crash has a shorter call stack but the top 5 frame are the same:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==14791==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x000104d1ba58 bp 0x00016b1f49d0 sp 0x00016b1f4840 T0)
==14791==The signal is caused by a UNKNOWN memory access.
==14791==Hint: address points to the zero page.
#0 0x104d1ba58 in getanode parse.c:338
#1 0x104d1d7a0 in item parse.c:1206
#2 0x104d1cda4 in term parse.c:644
#3 0x104d1c584 in list parse.c:615
#4 0x104d1ad40 in sh_cmd parse.c:563
#5 0x104d1ee68 in item parse.c:1339
#6 0x104d1cda4 in term parse.c:644
#7 0x104d1c584 in list parse.c:615
#8 0x104d1ad40 in sh_cmd parse.c:563
#9 0x104d1a688 in sh_parse parse.c:440
#10 0x104cc19d0 in exfile main.c:594
#11 0x104cbe70c in sh_main main.c:366
#12 0x104c096ec in main pmain.c:42
#13 0x105391088 in start+0x204 (dyld:arm64e+0x5088)
ASan gives a hint: “address points to the zero page”. The second argument (ap
) to the getanode() call is NULL, which is not supposed to happen.
A naive patch would be simply to throw a syntax error in case of a null argument pointer:
diff --git a/src/cmd/ksh93/sh/parse.c b/src/cmd/ksh93/sh/parse.c
index 4c5e08f09..328706879 100644
--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -1203,6 +1203,12 @@ static Shnode_t *item(Lex_t *lexp,int flag)
break;
/* ((...)) arithmetic expression */
case EXPRSYM:
+ if(!lexp->arg)
+ {
+ /* A null arg pointer may happen here in case of an unmatched `((' with a nested $(( ... )). */
+ lexp->token = EOFSYM; /* make it say "`((' unmatched" instead of "`((' unexpected" */
+ sh_syntax(lexp);
+ }
t = getanode(lexp,lexp->arg);
sh_lex(lexp);
goto done;
This is a kludge, but does seem to fix the crash effectively. The output of the minimal reproducer is now:
issue764.sh: syntax error at line 3: `((' unmatched
The question remains: how did that pointer became null in the first place? It may well have something to do with the fact that the lexer invokes the parser recursively to check for a pair of double parentheses at the start and end before determining EXPRSYM
, the arithmetic command symbol (see lex.c, lines 479-497). Figuring that out might show us the way to a better fix.
It can also be fixed in lex.c, which is a bit less kludgy. We need to bail out if the recursive sh_lex()
invocation returns zero.
To apply instead of the previous patch:
diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 7439e4969..d6c1f739b 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -478,6 +478,7 @@ int sh_lex(Lex_t* lp)
{
if(c==LPAREN)
{
+ int r;
/* Avoid misdetecting EXPRSYM in [[ ... ]] or compound assignments */
if(lp->lex.intest || lp->comp_assign)
return lp->token=c;
@@ -493,7 +494,14 @@ int sh_lex(Lex_t* lp)
lp->lastline = sh.inlineno;
lp->lexd.lex_state = ST_NESTED;
fcseek(1);
- return sh_lex(lp);
+ r = sh_lex(lp);
+ if(r==0)
+ { /* throw "`((' unmatched" error */
+ lp->lasttok = lp->token;
+ lp->token = EOFSYM;
+ sh_syntax(lp);
+ }
+ return r;
}
c |= SYMREP;
/* Here document redirection operator '<<' */
If you feel you are adding hacks/kludgy, then I recommend adding something like #ifdef HORRIBLE_HACK_FIXME
... #endif
, or something similar. That puts such code on a ToDo list for later re-evaluation&&
rework.
As the second patch shows, the problem occurs when the recursive sh_lex invocation returns 0. I've been tracing where that happens. It's in case S_POP
, here: https://github.com/ksh93/ksh/blob/22cf8624dbc20c970be211065f5f15f13add73dd/src/cmd/ksh93/sh/lex.c#L1098-L1099
Oddly enough, mode
is ST_NONE
there and not ST_NESTED
. That's because it was changed on line 1065.
I've found another related bug:
$ ksh -c 'echo $(( }; echo end'
end
$ ksh -c '(( }; echo end'
Memory fault
The }
is an S_POP
character (see lexstates.c, the ST_NESTED
table) so that triggers the return 0
, causing ksh to fail to throw a syntax error here, resulting in an inconsistent state.
Trying to fix this directly in the S_POP case is causing me thousands of regression test failures, though. So I think the second patch above had the right idea. That is the most specific place to fix it, with the minimum risk of side effects.
Patch three solves the reported bug as well as the newly found bug above: accept only the expected LPAREN or EXPRSYM as the return value for the recursive sh_lex invocation. (Also, this now saves the line number for a more helpful error message: the line number with the unmatched ((
is shown.)
diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 7439e4969..a8234c462 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -478,6 +478,7 @@ int sh_lex(Lex_t* lp)
{
if(c==LPAREN)
{
+ int r, l;
/* Avoid misdetecting EXPRSYM in [[ ... ]] or compound assignments */
if(lp->lex.intest || lp->comp_assign)
return lp->token=c;
@@ -490,10 +491,16 @@ int sh_lex(Lex_t* lp)
* But this cannot be concluded until a final '))' is detected.
* Use a recursive lexer invocation for that. */
lp->lexd.nest=1;
- lp->lastline = sh.inlineno;
+ l = lp->lastline = sh.inlineno;
lp->lexd.lex_state = ST_NESTED;
fcseek(1);
- return sh_lex(lp);
+ r = sh_lex(lp);
+ /* If case S_POP returned 0, throw "`((' unmatched" error. */
+ if(r==0)
+ lp->lastline = l, lp->lasttok = lp->token, lp->token = EOFSYM;
+ if(r!=LPAREN && r!=EXPRSYM)
+ sh_syntax(lp);
+ return r;
}
c |= SYMREP;
/* Here document redirection operator '<<' */
Patch version four fixes the error message for the related bug fix.
diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 7439e4969..146b57624 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -478,6 +478,7 @@ int sh_lex(Lex_t* lp)
{
if(c==LPAREN)
{
+ int r, l;
/* Avoid misdetecting EXPRSYM in [[ ... ]] or compound assignments */
if(lp->lex.intest || lp->comp_assign)
return lp->token=c;
@@ -490,10 +491,18 @@ int sh_lex(Lex_t* lp)
* But this cannot be concluded until a final '))' is detected.
* Use a recursive lexer invocation for that. */
lp->lexd.nest=1;
- lp->lastline = sh.inlineno;
+ l = lp->lastline = sh.inlineno;
lp->lexd.lex_state = ST_NESTED;
fcseek(1);
- return sh_lex(lp);
+ r = sh_lex(lp);
+ if(r!=LPAREN && r!=EXPRSYM)
+ { /* throw "`((' unmatched" error */
+ lp->lastline = l;
+ lp->lasttok = EXPRSYM;
+ lp->token = EOFSYM;
+ sh_syntax(lp);
+ }
+ return r;
}
c |= SYMREP;
/* Here document redirection operator '<<' */
Actually, it looks to be possible to reliably prevent the return 0
in case S_POP
. The lp->lexd.arith
flag is set during a recursive lexer invocation for detecting ))
after ((
. It looks to be enough to prevent the return 0
if that flag is set. All the reproducers are fixed and there are no regression test failures. The error message is now `}' unexpected
instead of `((' unmatched
, but that's okay.
Patch five (supersedes all the previous ones):
diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 7439e4969..3ac54becf 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -1095,7 +1095,7 @@ int sh_lex(Lex_t* lp)
}
return lp->token=LPAREN;
}
- if(mode==ST_NONE)
+ if(mode==ST_NONE && !lp->lexd.arith)
return 0;
if(c!=n && lp->lex.incase<TEST_RE)
{
However, the error messages produced by patch four are a lot more helpful, and patch four may also catch undiscovered cases that patch five doesn't.
So, patch four has my preference over five. But, though the behaviour is helpful, it is not the same as what you get when your remove the compound command braces. The following makes it act the same: throw an error about an unmatched (
at the line where the final )
is found. And that is what I'll now commit, along with regression tests.
diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 7439e4969..1a53eb5ea 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -478,6 +478,7 @@ int sh_lex(Lex_t* lp)
{
if(c==LPAREN)
{
+ int r;
/* Avoid misdetecting EXPRSYM in [[ ... ]] or compound assignments */
if(lp->lex.intest || lp->comp_assign)
return lp->token=c;
@@ -493,7 +494,14 @@ int sh_lex(Lex_t* lp)
lp->lastline = sh.inlineno;
lp->lexd.lex_state = ST_NESTED;
fcseek(1);
- return sh_lex(lp);
+ r = sh_lex(lp);
+ if(r!=LPAREN && r!=EXPRSYM)
+ { /* throw "`(' unmatched" error */
+ lp->lasttok = LPAREN;
+ lp->token = EOFSYM;
+ sh_syntax(lp);
+ }
+ return r;
}
c |= SYMREP;
/* Here document redirection operator '<<' */
The following simple filesystem stress test crashes ksh93 1.0.9/Cygwin:
make_numtree1.ksh93.txt