ksh93 / ksh

ksh 93u+m: KornShell lives! | Latest release: https://github.com/ksh93/ksh/releases
Eclipse Public License 2.0
179 stars 30 forks source link

Nested compound assignment misparsed in `$(`…`)` command substitution #269

Closed McDutchie closed 1 year ago

McDutchie commented 3 years ago

This bug was found by @hyenias, here:

There appears to be a parsing issue with (( during a command substitution with a compound array assignment. Notice below when a space is used to break apart the double left parentheses, parsing works and the array is created.

$ arch/*/bin/ksh -c 'got=$( typeset -a arr=((a b c) 1); readonly arr; typeset -p arr ); echo $got'
arch/linux.i386-64/bin/ksh: syntax error at line 1: `(' unmatched
$ arch/*/bin/ksh -c 'got=$( typeset -a arr=( (a b c) 1); readonly arr; typeset -p arr ); echo $got'
typeset -r -a arr=((a b c) 1)

Command subshells do not have this issue:

$ arch/*/bin/ksh -c '( typeset -a arr=((a b c) 1); readonly arr; typeset -p arr )'
typeset -r -a arr=((a b c) 1)
$ arch/*/bin/ksh -c '( typeset -a arr=( (a b c) 1); readonly arr; typeset -p arr )'
typeset -r -a arr=((a b c) 1)

Note that backtick command substitutions don't have the bug either:

$ arch/*/bin/ksh -c 'got=` typeset -a arr=((a b c) 1); readonly arr; typeset -p arr `; echo $got'
typeset -r -a arr=((a b c) 1)

So, this is a parser bug that is specific to command substitutions of the $() variety.

McDutchie commented 3 years ago

The bug should be somewhere in this function: https://github.com/ksh93/ksh/blob/7a2d3564b68be537d3f68188e3f0c5df2b45dd13/src/cmd/ksh93/sh/lex.c#L1532-L1681

If you insert this debug line:

--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -1578,6 +1578,7 @@ static int comsub(register Lex_t *lp, int endtok)
            while(1)
            {
                fcgetc(c);
+error(ERROR_warn(0),"[DEBUG] c==%c/%d",c,c);
                /* skip leading white space */
                if(n==0 && !sh_lexstates[ST_BEGIN][c])
                    continue;

…then the non-buggy reproducer shows output more or less as you might expect:

$ arch/*/bin/ksh -c 'got=$( typeset -a arr=( (a b c) 1); readonly arr; typeset -p arr ); echo $got'
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==y/121
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==p/112
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==s/115
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==-/45
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==(/40
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==(/40
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==b/98
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==c/99
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==)/41
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==1/49
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==)/41
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==;/59
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==d/100
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==o/111
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==;/59
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==y/121
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==p/112
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==s/115
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==-/45
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==)/41
typeset -r -a arr=((a b c) 1)

…whereas with the buggy reproducer, something clearly goes haywire, causing some strange kind of recursion which oddly enough is not infinite, but still causes very long output:

$ arch/*/bin/ksh -c 'got=$( typeset -a arr=((a b c) 1); readonly arr; typeset -p arr ); echo $got'
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==y/121
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==p/112
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==s/115
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==-/45
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==(/40
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==g/103
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==o/111
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==y/121
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==p/112
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==s/115
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==-/45
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==(/40
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==g/103
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==o/111
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c===/61
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==t/116
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==y/121
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==p/112
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==s/115
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==-/45

... 254 repetitions (thousands of lines) deleted ...

arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==a/97
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==r/114
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==)/41
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==;/59
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==e/101
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==c/99
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==h/104
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==o/111
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c== /32
arch/darwin.i386-64/bin/ksh: warning: [DEBUG] c==/0
arch/darwin.i386-64/bin/ksh: syntax error at line 1: `(' unmatched
atheik commented 2 years ago

Here's a patch that I think fixes the issue: Edit: This is broken.

diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index c9971851..8a6e6c2d 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -313,6 +313,15 @@ int sh_lex(Lex_t* lp)
        /* skip over characters in the current state */
        state = sh_lexstates[mode];
        while((n=STATE(state,c))==0);
+       /*
+       sfprintf(sfstderr,"sh_lex:        mode: %d\n",mode);        sfsync(sfstderr);
+       sfprintf(sfstderr,"sh_lex: oldmode(lp): %d\n",oldmode(lp)); sfsync(sfstderr);
+       sfprintf(sfstderr,"sh_lex:           c: %d\n",c);           sfsync(sfstderr);
+       sfprintf(sfstderr,"sh_lex:           n: %d\n",n);           sfsync(sfstderr);
+       sfprintf(sfstderr,"======\n");                              sfsync(sfstderr);
+       sfprintf(sfstderr,"sh_lex: lp->lexd.dolparen: %d\n",lp->lexd.dolparen); sfsync(sfstderr);
+       sfprintf(sfstderr,"^^^^^^\n");                                          sfsync(sfstderr);
+       */
        switch(n)
        {
            case S_BREAK:
@@ -478,6 +487,11 @@ int sh_lex(Lex_t* lp)
                }
                if(fcgetc(n)>0)
                    fcseek(-LEN);
+               /*
+               sfprintf(sfstderr,"sh_lex: case S_OP: n: %d\n",n);               sfsync(sfstderr);
+               sfprintf(sfstderr,"sh_lex: case S_OP: state[n]: %d\n",state[n]); sfsync(sfstderr);
+               sfprintf(sfstderr,"^^^^^^\n");                                   sfsync(sfstderr);
+               */
                if(state[n]==S_OP || n=='#')
                {
                    if(n==c)
@@ -486,12 +500,19 @@ int sh_lex(Lex_t* lp)
                            lp->lexd.docword=1;
                        else if(n==LPAREN)
                        {
+                           /* don't switch to ST_NESTED if called from comsub() */
+                           if(lp->lexd.dolparen)
+                               return(lp->token=c);
                            if(lp->lex.intest)
                                return(c);
                            lp->lexd.nest=1;
                            lp->lastline = sh.inlineno;
                            lp->lexd.lex_state = ST_NESTED;
                            fcseek(1);
+                           /*
+                           sfprintf(sfstderr,"sh_lex: case S_OP: return\n"); sfsync(sfstderr);
+                           sfprintf(sfstderr,"^^^^^^\n");                    sfsync(sfstderr);
+                           */
                            return(sh_lex(lp));
                        }
                        c  |= SYMREP;
@@ -798,7 +819,15 @@ int sh_lex(Lex_t* lp)
                poplevel(lp);
                fcseek(-LEN);
                n = lp->digits;
+               /*
+               sfprintf(sfstderr,"sh_lex: case S_PAR: comsub(lp,c): begin\n"); sfsync(sfstderr);
+               sfprintf(sfstderr,"^^^^^^\n");                                  sfsync(sfstderr);
+               */
                wordflags |= comsub(lp,c);
+               /*
+               sfprintf(sfstderr,"sh_lex: case S_PAR: comsub(lp,c): end\n"); sfsync(sfstderr);
+               sfprintf(sfstderr,"^^^^^^\n");                                sfsync(sfstderr);
+               */
                lp->digits = n;
                continue;
            case S_RBRA:
@@ -1002,6 +1031,10 @@ int sh_lex(Lex_t* lp)
                continue;
            case S_POP:
            do_pop:
+               /*
+               sfprintf(sfstderr,"sh_lex: case S_POP\n"); sfsync(sfstderr);
+               sfprintf(sfstderr,"^^^^^^\n");             sfsync(sfstderr);
+               */
                if(c==RBRACE && mode==ST_NESTED && lp->lex.nestedbrace)
                {
                    lp->lex.nestedbrace--;
@@ -1074,6 +1107,10 @@ int sh_lex(Lex_t* lp)
                        }
                        lp->lexd.paren = 1;
                    }
+                   /*
+                   sfprintf(sfstderr,"sh_lex: case S_POP: return\n"); sfsync(sfstderr);
+                   sfprintf(sfstderr,"^^^^^^\n");                     sfsync(sfstderr);
+                   */
                    return(lp->token=LPAREN);
                }
                if(mode==ST_NONE)
@@ -1578,7 +1615,18 @@ static int comsub(register Lex_t *lp, int endtok)
                fcseek(-LEN);
            if(c==RBRACE && lp->lex.incase)
                lp->lex.incase=0;
+           /*
+           sfprintf(sfstderr,"comsub: sh_lex(lp): begin\n"); sfsync(sfstderr);
+           sfprintf(sfstderr,"^^^^^^\n");                    sfsync(sfstderr);
+           */
            c=sh_lex(lp);
+           /*
+           sfprintf(sfstderr,"comsub: sh_lex(lp): end\n"); sfsync(sfstderr);
+           sfprintf(sfstderr,"^^^^^^\n");                  sfsync(sfstderr);
+           sfprintf(sfstderr,"comsub: c: %d\n",c);                                   sfsync(sfstderr);
+           sfprintf(sfstderr,"comsub: lp->lexd.lex_state: %d\n",lp->lexd.lex_state); sfsync(sfstderr);
+           sfprintf(sfstderr,"^^^^^^\n");                                            sfsync(sfstderr);
+           */
            switch(c)
            {
                case LBRACE:
diff --git a/src/cmd/ksh93/sh/parse.c b/src/cmd/ksh93/sh/parse.c
index 97df31b6..23adeb92 100644
--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -1568,7 +1568,15 @@ static Shnode_t *simple(Lex_t *lexp,int flag, struct ionod *io)
            lexp->aliasok = 0;
        }
    retry:
+       /*
+       sfprintf(sfstderr,"simple: sh_lex(lexp): begin\n"); sfsync(sfstderr);
+       sfprintf(sfstderr,"^^^^^^\n");                      sfsync(sfstderr);
+       */
        tok = sh_lex(lexp);
+       /*
+       sfprintf(sfstderr,"simple: sh_lex(lexp): end\n"); sfsync(sfstderr);
+       sfprintf(sfstderr,"^^^^^^\n");                    sfsync(sfstderr);
+       */
        if(was_assign && check_array(lexp))
            type = NV_ARRAY;
        if(tok==LABLSYM && (flag&SH_ASSIGN))
@@ -1583,6 +1591,10 @@ static Shnode_t *simple(Lex_t *lexp,int flag, struct ionod *io)
        }
        if(tok==LPAREN)
        {
+           /*
+           sfprintf(sfstderr,"simple: tok==LPAREN: argp->argflag&ARG_ASSIGN: %d\n",argp->argflag&ARG_ASSIGN); sfsync(sfstderr);
+           sfprintf(sfstderr,"^^^^^^\n");                                                                     sfsync(sfstderr);
+           */
            if(argp->argflag&ARG_ASSIGN)
            {
                int intypeset = lexp->intypeset;

If you comment out the fix and uncomment the debug prints, then I hope you'll see what I was going after in this fix:

# works since space prevents switch to ST_NESTED
arch/linux.i386-64/bin/ksh -c ': $(A=( (B) C))'

# broken since comsub() does not handle the situation followed by the switch to ST_NESTED
# this leads to recursion where sh_lex() called from comsub() calls comsub() and so on
# there are no nesting $(comsubs) here and so lp->lexd.dolparen should only be incremented up to 1
arch/linux.i386-64/bin/ksh -c ': $(A=((B) C))'

# works since simple() handles the situation followed by the ST_NESTED switch
arch/linux.i386-64/bin/ksh -c 'A=((B) C)'

I couldn't figure out a way to fix this by modifying comsub().

hyenias commented 2 years ago

@atheik I have hand tested your fix and all appears to work.

diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index c9971851..7d9d67be 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -486,6 +486,9 @@ int sh_lex(Lex_t* lp)
                                                        lp->lexd.docword=1;
                                                else if(n==LPAREN)
                                                {
+                                                       /* don't switch to ST_NESTED if called from comsub() */
+                                                       if(lp->lexd.dolparen)
+                                                               return(lp->token=c);
                                                        if(lp->lex.intest)
                                                                return(c);
                                                        lp->lexd.nest=1;
atheik commented 2 years ago

@hyenias Thank you for testing! I've looked into this some more and I think this fix changes some of the sh_lex() codepaths too much to be a correct fix. This section which is reached after the ST_NESTED switch might be of interest: https://github.com/ksh93/ksh/blob/490edfe/src/cmd/ksh93/sh/lex.c#L1075-L1087 The section used to be like this: https://github.com/ksh93/ksh93-history/blob/2009-12-24/src/cmd/ksh93/sh/lex.c#L1082-L1098 I'll think about this some more.

McDutchie commented 2 years ago

I also could not break @atheik's fix, until I ran the ShellSpec regression tests which seems to cover some things nothing else does. It breaks the << bitwise shift arithmetic operator in an arithmetic expansion:

$ arch/*/bin/ksh -c 'foo=$((foo << 1))'
arch/darwin.i386-64/bin/ksh: syntax error at line 1: `<<1' here-document not contained within command substitution

There is no command substitution in that reproducer, of course. There is probably a way to amend the patch to exclude an arithmetic expansion.

atheik commented 2 years ago

@McDutchie, Good thing you ran those tests! I don't think this can be fixed by amending to my patch. I think the proper place for a fix is at the section after the switch to ST_NESTED I highlighted at https://github.com/ksh93/ksh/issues/269#issuecomment-1133765383, but I am stuck trying to figure it out. In hindsight I'm surprised that f73d698 hasn't broken anything as it is similarly disruptive in the way that it changes some sh_lex() codepaths.

McDutchie commented 2 years ago

This seems to work:

--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -488,6 +488,9 @@ int sh_lex(Lex_t* lp)
                            lp->lexd.docword=1;
                        else if(n==LPAREN)
                        {
+                           /* don't switch to ST_NESTED if we have a compound assignment in a comsub */
+                           if(lp->lexd.dolparen && fcpeek(-2)=='=')
+                               return(lp->token=c);
                            if(lp->lex.intest)
                                return(c);
                            lp->lexd.nest=1;
McDutchie commented 2 years ago

This section which is reached after the ST_NESTED switch might be of interest: https://github.com/ksh93/ksh/blob/490edfe/src/cmd/ksh93/sh/lex.c#L1075-L1087 The section used to be like this: https://github.com/ksh93/ksh93-history/blob/2009-12-24/src/cmd/ksh93/sh/lex.c#L1082-L1098

Thank you for spotting that, @atheik. Restoring these three lines from ksh 2009-12-24 also seems to fix the bug, but unfortunately causes a regression:

--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -1075,6 +1075,9 @@ int sh_lex(Lex_t* lp)
                        return(lp->token=EXPRSYM);
                    }
                    /* backward compatibility */
+                   if(lp->lexd.dolparen)
+                       fcseek(-1);
+                   else
                    {
                        if(lp->lexd.warn)
                            errormsg(SH_DICT,ERROR_warn(0),e_lexnested,sh.inlineno);
    subshell.sh[486]: FAIL: 'echo $((case x in x)echo ok;esac);:)' failed -- expected 'ok', got '/usr/local/src/ksh93/ksh/arch.dev/darwin.i386-64/bin/ksh: syntax error at line 1: `)' unexpected'
McDutchie commented 2 years ago

Meanwhile I've tried pretty hard to break this patch and I haven't managed – it also passes all the regression tests I can throw at it. Given that we're stuck trying to figure out anything else, I think I'm just going to commit that, even if it is not the most correct fix.

That doesn't mean it's set in stone. If a better fix is found later, it can replace that one.

Also I think the only way we're going to find out about any problems with it is by subjecting it to wider testing.

McDutchie commented 2 years ago

This slightly tweaked version also works: we can just return(c).

--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -488,6 +488,10 @@ int sh_lex(Lex_t* lp)
                            lp->lexd.docword=1;
                        else if(n==LPAREN)
                        {
+                           /* to avoid <https://github.com/ksh93/ksh/issues/269>, don't switch to
+                            * ST_NESTED if we have '=((' (nested compound assignment) in a comsub */
+                           if(lp->lexd.dolparen && fcpeek(-2)=='=')
+                               return(c);
                            if(lp->lex.intest)
                                return(c);
                            lp->lexd.nest=1;
McDutchie commented 2 years ago

Or not. There are still problems left when a (( doesn't immediately follow a =:

$ arch/*/bin/ksh -c 'got=$( typeset -a arr=( ( ((a b c)1))); typeset -p arr ); echo $got'
arch/darwin.i386-64/bin/ksh: syntax error at line 1: `(' unmatched

The problem is there doesn't seem to be a way to figure out if lp->lexd.dolparen means a command substitution or an arithmetic expansion.

McDutchie commented 1 year ago

New patch that uses a new flag in the lexer state struct to keep track of nested parentheses levels after encountering =(. In the location of the previous patch, we still need to do a check for =( since it occurs there before comsub() has a chance to detect it, but comsub() now detects it too and will flag it up by remembering the outermost level of parentheses where =( was first encountered, so that the new patch can check that flag and also handle nested parentheses correctly.

It's a hack, but then again, so is the whole comsub skipping mechanism. I have not yet managed to break the patch below, and it passes all the regression tests (ksh, modernish, shellspec) and all the reproducers above.

New patch ```diff diff --git a/src/cmd/ksh93/include/shlex.h b/src/cmd/ksh93/include/shlex.h index 67f586e38..d831f538a 100644 --- a/src/cmd/ksh93/include/shlex.h +++ b/src/cmd/ksh93/include/shlex.h @@ -46,6 +46,7 @@ struct _shlex_pvt_lexdata_ char nocopy; char paren; char dolparen; + unsigned short dolparen_eqparen; char nest; char docword; char nested_tilde; diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c index a99863705..2b8fcb7cd 100644 --- a/src/cmd/ksh93/sh/lex.c +++ b/src/cmd/ksh93/sh/lex.c @@ -478,6 +478,10 @@ int sh_lex(Lex_t* lp) lp->lexd.docword=1; else if(n==LPAREN) { + /* to avoid , don't switch to + * ST_NESTED if we have '=((' (nested compound assignment) in a comsub */ + if(lp->lexd.dolparen && (fcpeek(-2)=='=' || lp->lexd.dolparen_eqparen)) + return c; if(lp->lex.intest) return c; /* '((' arithmetic command */ @@ -1511,7 +1515,8 @@ breakloop: */ static int comsub(Lex_t *lp, int endtok) { - int n,c,count=1; + int n,c; + unsigned short count = 1; /* parentheses/braces level counter */ int line=sh.inlineno; struct ionod *inheredoc = lp->heredoc; char *first,*cp=fcseek(0),word[5]; @@ -1592,7 +1597,7 @@ static int comsub(Lex_t *lp, int endtok) break; case RBRACE: rbrace: - if(endtok==LBRACE && --count<=0) + if(endtok==LBRACE && --count==0) goto done; if(count==1) lp->comsub = endtok==LBRACE; @@ -1600,13 +1605,24 @@ static int comsub(Lex_t *lp, int endtok) case IPROCSYM: case OPROCSYM: case LPAREN: if(endtok==LPAREN && !lp->lex.incase) + { count++; + /* flag up "=(": array/compound assignment */ + if(!lp->lexd.dolparen_eqparen && fcpeek(-2)=='=') + lp->lexd.dolparen_eqparen = count; + } break; case RPAREN: if(lp->lex.incase) lp->lex.incase=0; - else if(endtok==LPAREN && --count<=0) - goto done; + else if(endtok==LPAREN) + { + count--; + if(lp->lexd.dolparen_eqparen > count) + lp->lexd.dolparen_eqparen = 0; + if(count==0) + goto done; + } break; case EOFSYM: lp->lastline = line; ```
McDutchie commented 1 year ago

(Note that count is known never to be < 0, so I made it into an unsigned short. The count<=0 comparisons were misleading and are better written as count==0.)

McDutchie commented 1 year ago

By the way, thanks, @atheik, for your original patch - my patch above is just a tweaked/extended version of yours and I couldn't have done it without all your debugging work.

McDutchie commented 1 year ago

Actually, I can do better than that. I found a way to distinguish between comsub() skipping an arithmetic expression and skipping a command substitution. Most of the comsub() code is for skipping a command substitution, which happens if(n==endtok || off<0).

This allows us to simply flag that up, and not have to special-case compound assignments at all. Much cleaner.

Better patch ```diff diff --git a/src/cmd/ksh93/include/shlex.h b/src/cmd/ksh93/include/shlex.h index 67f586e38..0f2a5a14d 100644 --- a/src/cmd/ksh93/include/shlex.h +++ b/src/cmd/ksh93/include/shlex.h @@ -45,7 +45,8 @@ struct _shlex_pvt_lexdata_ { char nocopy; char paren; - char dolparen; + char dolparen; /* nesting level of $(...)/${ ...;}/$((...)) that comsub() is reading past */ + char skippingcomsub; /* set while comsub() is reading past new comsub: $(...) or ${ ...;} */ char nest; char docword; char nested_tilde; diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c index a99863705..ef36f227c 100644 --- a/src/cmd/ksh93/sh/lex.c +++ b/src/cmd/ksh93/sh/lex.c @@ -478,7 +478,7 @@ int sh_lex(Lex_t* lp) lp->lexd.docword=1; else if(n==LPAREN) { - if(lp->lex.intest) + if(lp->lex.intest || lp->lexd.skippingcomsub) return c; /* '((' arithmetic command */ lp->lexd.nest=1; @@ -1507,7 +1507,8 @@ breakloop: /* * read to end of command substitution - * of the form $(...) + * of the form $(...) or ${ ...;} + * or arithmetic expansion $((...))) */ static int comsub(Lex_t *lp, int endtok) { @@ -1520,6 +1521,7 @@ static int comsub(Lex_t *lp, int endtok) csub = lp->comsub; sh_lexopen(lp,1); lp->lexd.dolparen++; + lp->lexd.skippingcomsub=0; lp->lex.incase=0; pushlevel(lp,0,0); lp->comsub = (endtok==LBRACE); @@ -1534,6 +1536,7 @@ static int comsub(Lex_t *lp, int endtok) *cp = c; if(n==endtok || off<0) { + lp->lexd.skippingcomsub++; if(endtok==LPAREN && lp->lexd.paren) { if(first==lp->lexd.first) @@ -1644,6 +1647,7 @@ done: lp->comsub = csub; lp->lastline = line; lp->lexd.dolparen--; + lp->lexd.skippingcomsub=0; lp->lex = save; lp->assignok = (endchar(lp)==RBRACT?assignok:0); if(lp->heredoc && !inheredoc) ```
McDutchie commented 1 year ago

Urgh. The latest patch is broken after all: arithmetic expressions aren't parsed as such within command substitutions, so that the shift-left operator is mistaken for a here-document redirection again.

$ arch/darwin.arm64-64/bin/ksh -c ': $( (( 1 << 2 )) )'
arch/darwin.arm64-64/bin/ksh: syntax error at line 1: `<<2' here-document not contained within command substitution
McDutchie commented 1 year ago
Here are some regression tests. ```diff diff --git a/src/cmd/ksh93/tests/basic.sh b/src/cmd/ksh93/tests/basic.sh index 328d5d124..d443a91b6 100755 --- a/src/cmd/ksh93/tests/basic.sh +++ b/src/cmd/ksh93/tests/basic.sh @@ -1000,5 +1000,17 @@ do || err_exit "last command in script exec-optimized in spite of $sig trap ($pid1 == $pid2)" done +# ====== +# Nested compound assignment misparsed in $(...) or ${ ...; } command substitution +# https://github.com/ksh93/ksh/issues/269 +got=$(set +x; eval ': $( typeset -a arr=((a b c) 1) )' 2>&1) || err_exit "issue 269 test 1 (got $(printf %q "$got"))" +got=$(set +x; eval ': ${ typeset -a arr=((a b c) 1); }' 2>&1) || err_exit "issue 269 test 2 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( typeset -a arr=( ( ((a b c)1))) )' 2>&1) || err_exit "issue 269 test 3 (got $(printf %q "$got"))" +got=$(set +x; eval ': ${ typeset -a arr=( ( ((a b c)1))); }' 2>&1) || err_exit "issue 269 test 4 (got $(printf %q "$got"))" +got=$(set +x; eval ': $(( 1 << 2 ))' 2>&1) || err_exit "issue 269 test 5 (got $(printf %q "$got"))" +got=$(set +x; eval ': $(: $(( 1 << 2 )) )' 2>&1) || err_exit "issue 269 test 6 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( (( 1 << 2 )) )' 2>&1) || err_exit "issue 269 test 7 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( : $( (( 1 << 2 )) ) )' 2>&1) || err_exit "issue 269 test 8 (got $(printf %q "$got"))" + # ====== exit $((Errors<125?Errors:125)) ```

Neither of the candidate patches currently pass all these regression tests. The last one fails test 7. The previous, hackier one one fails test 4. (Without any patch, tests 1, 2, 3 and 4 fail.)

Sigh. Back to the drawing board?

McDutchie commented 1 year ago

Maybe not. I found a way to make the hackier patch pass all the regression tests above. The error in it was that the new hack was only applied if endtok==LPAREN, which is nonsense.

New new patch ```diff diff --git a/src/cmd/ksh93/include/shlex.h b/src/cmd/ksh93/include/shlex.h index 67f586e38..d831f538a 100644 --- a/src/cmd/ksh93/include/shlex.h +++ b/src/cmd/ksh93/include/shlex.h @@ -46,6 +46,7 @@ struct _shlex_pvt_lexdata_ char nocopy; char paren; char dolparen; + unsigned short dolparen_eqparen; char nest; char docword; char nested_tilde; diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c index a99863705..9be3d0f7d 100644 --- a/src/cmd/ksh93/sh/lex.c +++ b/src/cmd/ksh93/sh/lex.c @@ -478,6 +478,10 @@ int sh_lex(Lex_t* lp) lp->lexd.docword=1; else if(n==LPAREN) { + /* to avoid , don't switch to + * ST_NESTED if we have '=((' (nested compound assignment) in a comsub */ + if(lp->lexd.dolparen && (fcpeek(-2)=='=' || lp->lexd.dolparen_eqparen)) + return c; if(lp->lex.intest) return c; /* '((' arithmetic command */ @@ -1507,7 +1511,8 @@ breakloop: /* * read to end of command substitution - * of the form $(...) + * of the form $(...) or ${ ...;} + * or arithmetic expansion $((...))) */ static int comsub(Lex_t *lp, int endtok) { @@ -1599,10 +1604,15 @@ static int comsub(Lex_t *lp, int endtok) break; case IPROCSYM: case OPROCSYM: case LPAREN: + /* flag up "=(": array/compound assignment */ + if(!lp->lexd.dolparen_eqparen && fcpeek(-2)=='=') + lp->lexd.dolparen_eqparen = count; if(endtok==LPAREN && !lp->lex.incase) count++; break; case RPAREN: + if(lp->lexd.dolparen_eqparen >= count) + lp->lexd.dolparen_eqparen = 0; if(lp->lex.incase) lp->lex.incase=0; else if(endtok==LPAREN && --count<=0) ```
McDutchie commented 1 year ago

The last version of the patch passes all the regression tests I was able to come up with. Though the patch is hacky, it's also specific: it only disables the detection of (()) arithmetic commands within =() within new-form command substitutions, and nowhere else. This limits the potential for unexpected side effects. So I think I'm now confident enough to commit this. Any further bugs in it are not going to be found if no one uses it.

McDutchie commented 1 year ago

Well, so much for that.

I don't think this bug is going away anytime soon.

McDutchie commented 1 year ago

OK, I'm nothing if not stubborn. This one is the previous one plus an additional flag to keep track of whether we're currently lexing an arithmetic expression. Please test. I'm going to give this a week now to find any breakage.

New patch candidate ```diff diff --git a/src/cmd/ksh93/include/shlex.h b/src/cmd/ksh93/include/shlex.h index 67f586e38..993011e5f 100644 --- a/src/cmd/ksh93/include/shlex.h +++ b/src/cmd/ksh93/include/shlex.h @@ -46,6 +46,7 @@ struct _shlex_pvt_lexdata_ char nocopy; char paren; char dolparen; + unsigned short dolparen_eqparen; char nest; char docword; char nested_tilde; @@ -54,6 +55,7 @@ struct _shlex_pvt_lexdata_ char warn; char message; char arith; + char arith_expan; char *first; int level; int lastc; diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c index a99863705..4d5cffeda 100644 --- a/src/cmd/ksh93/sh/lex.c +++ b/src/cmd/ksh93/sh/lex.c @@ -480,7 +480,12 @@ int sh_lex(Lex_t* lp) { if(lp->lex.intest) return c; - /* '((' arithmetic command */ + /* Workaround for a flaw in the comsub() skipping hack: avoid switching + * to ST_NESTED and recognizing ((...)) commands if we encounter '((' + * within a compound assignment in a new-form command substitution */ + if(lp->lexd.dolparen && !lp->lexd.arith_expan && (fcpeek(-2)=='=' || lp->lexd.dolparen_eqparen)) + return c; + /* '((' arithmetic command or '$((' arithmetic expansion */ lp->lexd.nest=1; lp->lastline = sh.inlineno; lp->lexd.lex_state = ST_NESTED; @@ -1507,19 +1512,23 @@ breakloop: /* * read to end of command substitution - * of the form $(...) + * of the form $(...) or ${ ...;} + * or arithmetic expansion $((...)) */ static int comsub(Lex_t *lp, int endtok) { - int n,c,count=1; + int n,c; + unsigned short count=1; int line=sh.inlineno; struct ionod *inheredoc = lp->heredoc; + char save_arith_expan = lp->lexd.arith_expan; char *first,*cp=fcseek(0),word[5]; int off, messages=0, assignok=lp->assignok, csub; struct _shlex_pvt_lexstate_ save = lp->lex; csub = lp->comsub; sh_lexopen(lp,1); lp->lexd.dolparen++; + lp->lexd.arith_expan = endtok==LPAREN && fcpeek(1)==LPAREN; /* $(( */ lp->lex.incase=0; pushlevel(lp,0,0); lp->comsub = (endtok==LBRACE); @@ -1599,10 +1608,15 @@ static int comsub(Lex_t *lp, int endtok) break; case IPROCSYM: case OPROCSYM: case LPAREN: + /* flag up "=(": array/compound assignment */ + if(!lp->lexd.dolparen_eqparen && fcpeek(-2)=='=') + lp->lexd.dolparen_eqparen = count; if(endtok==LPAREN && !lp->lex.incase) count++; break; case RPAREN: + if(lp->lexd.dolparen_eqparen >= count) + lp->lexd.dolparen_eqparen = 0; if(lp->lex.incase) lp->lex.incase=0; else if(endtok==LPAREN && --count<=0) @@ -1644,6 +1658,7 @@ done: lp->comsub = csub; lp->lastline = line; lp->lexd.dolparen--; + lp->lexd.arith_expan = save_arith_expan; lp->lex = save; lp->assignok = (endchar(lp)==RBRACT?assignok:0); if(lp->heredoc && !inheredoc) diff --git a/src/cmd/ksh93/tests/basic.sh b/src/cmd/ksh93/tests/basic.sh index 328d5d124..215b0817a 100755 --- a/src/cmd/ksh93/tests/basic.sh +++ b/src/cmd/ksh93/tests/basic.sh @@ -1000,5 +1000,21 @@ do || err_exit "last command in script exec-optimized in spite of $sig trap ($pid1 == $pid2)" done +# ====== +# Nested compound assignment misparsed in $(...) or ${ ...; } command substitution +# https://github.com/ksh93/ksh/issues/269 +got=$(set +x; eval ': $( typeset -a arr=((a b c) 1) )' 2>&1) || err_exit "issue 269 test 1 (got $(printf %q "$got"))" +got=$(set +x; eval ': ${ typeset -a arr=((a b c) 1); }' 2>&1) || err_exit "issue 269 test 2 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( typeset -a arr=( ( ((a b c)1))) )' 2>&1) || err_exit "issue 269 test 3 (got $(printf %q "$got"))" +got=$(set +x; eval ': ${ typeset -a arr=( ( ((a b c)1))); }' 2>&1) || err_exit "issue 269 test 4 (got $(printf %q "$got"))" +got=$(set +x; eval ': $(( 1 << 2 ))' 2>&1) || err_exit "issue 269 test 5 (got $(printf %q "$got"))" +got=$(set +x; eval ': $(: $(( 1 << 2 )) )' 2>&1) || err_exit "issue 269 test 6 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( (( 1 << 2 )) )' 2>&1) || err_exit "issue 269 test 7 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( : $( (( 1 << 2 )) ) )' 2>&1) || err_exit "issue 269 test 8 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( (( $( (( 1 << 2 )); echo 1 ) << 2 )) )' 2>&1) || err_exit "issue 269 test 9 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( typeset -a arr=((a $(( 1 << 2 )) c) 1) )' 2>&1) || err_exit "issue 269 test 10 (got $(printf %q "$got"))" +# TODO: test below crashes when actually executed; test parsing only by using noexec +got=$(set +x; eval 'set --noexec; : $( typeset -a arr=((a $(( $( typeset -a barr=((a $(( 1 << 2 )) c) 1); echo 1 ) << $( typeset -a bazz=((a $(( 1 << 2 )) c) 1); echo 2 ) )) c) 1) )' 2>&1) || err_exit "issue 269 test 11 (got $(printf %q "$got"))" + # ====== exit $((Errors<125?Errors:125)) ```
hyenias commented 1 year ago

Hand test successful with indexed arrays. Here is the hardest iterative test I was using for double parenthesis errors:

# All hand testing passed with indexed arrays, last most complex hand testing performed
$ ksh269 -c 'r=$( i=3; ((i)) && typeset -a arr=((a b c) $((1+0)) (e f g $((i)))) || ((i-3)) && echo ${arr[$((1+1))][1]}; typeset -p arr); typeset -p r arr'
r=$'f\ntypeset -a arr=((a b c) 1 (e f g 3) )'
$

# Switching over to hand testing compound variables using double parenthesis
$ ksh269 -c 'typeset -Ca arr=((a=ah b=beh c=si)); typeset -p arr'
ksh269: syntax error at line 1: `((' unexpected
$ ksh269 -c 'typeset -Ca arr=((a=ah b=beh c=si) ); typeset -p arr'
typeset -C -a arr=((a=ah;b=beh;c=si))
$
$ ksh269 -c 'r=$( typeset -Ca arr=((a=ah b=beh c=si)); typeset -p arr); echo $r'
ksh269: syntax error at line 1: `((' unexpected
$ ksh269 -c 'r=$( typeset -Ca arr=((a=ah b=beh c=si) ); typeset -p arr); echo $r'
typeset -C -a arr=((a=ah;b=beh;c=si))
$ ksh269 -c 'r=${ typeset -Ca arr=((a=ah b=beh c=si)); typeset -p arr; }; echo $r'
ksh269: syntax error at line 1: `((' unexpected
$ ksh269 -c 'r=${ typeset -Ca arr=((a=ah b=beh c=si) ); typeset -p arr; }; echo $r'
typeset -C -a arr=((a=ah;b=beh;c=si))

# Came across a new core dump while hand keying iterations
$ ksh269 -c 'r=$(typeset -C arr=((a=ah b=beh c=si) 1 (e f g)));'
free(): invalid pointer
Aborted (core dumped)
McDutchie commented 1 year ago

Thanks for testing!

# Came across a new core dump while hand keying iterations
$ ksh269 -c 'r=$(typeset -C arr=((a=ah b=beh c=si) 1 (e f g)));'
free(): invalid pointer
Aborted (core dumped)

Ha, that core dump was accidentally fixed in 4405a3527feae43fbbb72e823169a9cac8744fa1. I like it when that happens :)

As for your other failing test cases, they don't seem to have specifically to do with the command substitution parsing problem, so I think that's a different bug.

McDutchie commented 1 year ago

Actually, if I add a space, it still crashes :-/

$ arch/darwin.arm64-64/bin/ksh -c 'r=$(typeset -C arr=( (a=ah b=beh c=si) 1 (e f g)));'
ksh(80535,0x1010ac580) malloc: *** error for object 0x12de05e28: pointer being freed was not allocated
ksh(80535,0x1010ac580) malloc: *** set a breakpoint in malloc_error_break to debug
Abort
McDutchie commented 1 year ago

Hmmmmmmm.

Maybe, just maybe, the whole command substitution business has been a gigantic red herring distracting me from looking at the actual cause of the problem all this time.

Which is: the lexer is detecting a single (( symbol (LPAREN|SYMREP == EXPRSYM) in places where it should be detecting two separate opening parentheses (LPAREN followed by LPAREN). And, as some of your reproducers above show, it does that inside and outside of command substitutions.

Now, where is it appropriate to detect (( a.k.a. EXPRSYM?

  1. For the arithmetic command (()). In the shell grammar, this can only occur where any other command name can occur. This happens to be mostly the same places where reserved words are allowed to occur as well (except for in and do).
  2. For arithmetic expansions $(()). For this, it is sufficient to check whether the double parentheses are preceded by a dollar sign. In all other cases it should be detecting two separate opening parentheses, regardless of spaces.

So, I am working on a completely new approach to a patch, but I haven't figured it out yet. Flags like lp->lexd.reservok are flagged where it seems like they shouldn't be.

McDutchie commented 1 year ago

That train of thought did lead me to find a convenient flag lp->comp_assign that we can check to avoid detection of EXPRSYM in compound assignments. This fixes all the problems, but only outside of command substitutions, because that flag is set by the parser -- it requires parsing to know whether or not to set it.

So, the lexer-only command substitution reading business is not a red herring. We still need the hack to work around the hack. But we actually need both that and the new check for lp->comp_assign, because a command substitution is only lexed at parse time, but actually parsed at run time (and thus also lexed again, without the dolparen hack).

Yet another new patch ```diff diff --git a/src/cmd/ksh93/include/shlex.h b/src/cmd/ksh93/include/shlex.h index 67f586e38..0dd33cf79 100644 --- a/src/cmd/ksh93/include/shlex.h +++ b/src/cmd/ksh93/include/shlex.h @@ -46,6 +46,8 @@ struct _shlex_pvt_lexdata_ char nocopy; char paren; char dolparen; + unsigned short dolparen_eqparen; + char dolparen_arithexpan; char nest; char docword; char nested_tilde; diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c index a99863705..4c71be5dd 100644 --- a/src/cmd/ksh93/sh/lex.c +++ b/src/cmd/ksh93/sh/lex.c @@ -474,20 +474,28 @@ int sh_lex(Lex_t* lp) { if(n==c) { - if(c=='<') - lp->lexd.docword=1; - else if(n==LPAREN) + if(c==LPAREN) { - if(lp->lex.intest) - return c; - /* '((' arithmetic command */ + /* Avoid recognizing EXPRSYM in [[ ... ]] or compound assignments */ + if(lp->lex.intest || lp->comp_assign) + return lp->token=c; + /* Workaround for a flaw in the comsub() skipping hack: avoid switching + * to ST_NESTED and recognizing ((...)) commands if we encounter '((' + * within a compound assignment in a new-form command substitution */ + if(lp->lexd.dolparen && !lp->lexd.dolparen_arithexpan + && (fcpeek(-2)=='=' || lp->lexd.dolparen_eqparen)) + return lp->token=c; + /* '((' arithmetic command or '$((' arithmetic expansion */ lp->lexd.nest=1; lp->lastline = sh.inlineno; lp->lexd.lex_state = ST_NESTED; fcseek(1); return sh_lex(lp); } - c |= SYMREP; + c |= SYMREP; + /* Here document redirection operator '<<' */ + if(c==IODOCSYM) + lp->lexd.docword = 1; } else if(c=='(' || c==')') return lp->token=c; @@ -1507,19 +1515,23 @@ breakloop: /* * read to end of command substitution - * of the form $(...) + * of the form $(...) or ${ ...;} + * or arithmetic expansion $((...)) */ static int comsub(Lex_t *lp, int endtok) { - int n,c,count=1; + int n,c; + unsigned short count=1; int line=sh.inlineno; struct ionod *inheredoc = lp->heredoc; + char save_arithexpan = lp->lexd.dolparen_arithexpan; char *first,*cp=fcseek(0),word[5]; int off, messages=0, assignok=lp->assignok, csub; struct _shlex_pvt_lexstate_ save = lp->lex; csub = lp->comsub; sh_lexopen(lp,1); lp->lexd.dolparen++; + lp->lexd.dolparen_arithexpan = endtok==LPAREN && fcpeek(1)==LPAREN; /* $(( */ lp->lex.incase=0; pushlevel(lp,0,0); lp->comsub = (endtok==LBRACE); @@ -1599,10 +1611,15 @@ static int comsub(Lex_t *lp, int endtok) break; case IPROCSYM: case OPROCSYM: case LPAREN: + /* flag up "=(": array/compound assignment */ + if(!lp->lexd.dolparen_eqparen && fcpeek(-2)=='=') + lp->lexd.dolparen_eqparen = count; if(endtok==LPAREN && !lp->lex.incase) count++; break; case RPAREN: + if(lp->lexd.dolparen_eqparen >= count) + lp->lexd.dolparen_eqparen = 0; if(lp->lex.incase) lp->lex.incase=0; else if(endtok==LPAREN && --count<=0) @@ -1644,6 +1661,7 @@ done: lp->comsub = csub; lp->lastline = line; lp->lexd.dolparen--; + lp->lexd.dolparen_arithexpan = save_arithexpan; lp->lex = save; lp->assignok = (endchar(lp)==RBRACT?assignok:0); if(lp->heredoc && !inheredoc) diff --git a/src/cmd/ksh93/tests/basic.sh b/src/cmd/ksh93/tests/basic.sh index 328d5d124..8676caedc 100755 --- a/src/cmd/ksh93/tests/basic.sh +++ b/src/cmd/ksh93/tests/basic.sh @@ -1000,5 +1000,25 @@ do || err_exit "last command in script exec-optimized in spite of $sig trap ($pid1 == $pid2)" done +# ====== +# Nested compound assignment misparsed in $(...) or ${ ...; } command substitution +# https://github.com/ksh93/ksh/issues/269 +got=$(set +x; eval ': $( typeset -a arr=((a b c) 1) )' 2>&1) || err_exit "issue 269 test 1 (got $(printf %q "$got"))" +got=$(set +x; eval ': ${ typeset -a arr=((a b c) 1); }' 2>&1) || err_exit "issue 269 test 2 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( typeset -a arr=( ( ((a b c)1))) )' 2>&1) || err_exit "issue 269 test 3 (got $(printf %q "$got"))" +got=$(set +x; eval ': ${ typeset -a arr=( ( ((a b c)1))); }' 2>&1) || err_exit "issue 269 test 4 (got $(printf %q "$got"))" +got=$(set +x; eval ': $(( 1 << 2 ))' 2>&1) || err_exit "issue 269 test 5 (got $(printf %q "$got"))" +got=$(set +x; eval ': $(: $(( 1 << 2 )) )' 2>&1) || err_exit "issue 269 test 6 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( (( 1 << 2 )) )' 2>&1) || err_exit "issue 269 test 7 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( : $( (( 1 << 2 )) ) )' 2>&1) || err_exit "issue 269 test 8 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( (( $( (( 1 << 2 )); echo 1 ) << 2 )) )' 2>&1) || err_exit "issue 269 test 9 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( typeset -a arr=((a $(( 1 << 2 )) c) 1) )' 2>&1) || err_exit "issue 269 test 10 (got $(printf %q "$got"))" +got=$(set +x; eval 'typeset -Ca arr=((a=ah b=beh c=si))' 2>&1) || err_exit "issue 269 test 11 (got $(printf %q "$got"))" +got=$(set +x; eval ': $( typeset -Ca arr=((a=ah b=beh c=si)) )' 2>&1) || err_exit "issue 269 test 12 (got $(printf %q "$got"))" +got=$(set +x; eval 'r=${ typeset -Ca arr=((a=ah b=beh c=si)); }' 2>&1) || err_exit "issue 269 test 13 (got $(printf %q "$got"))" +# TODO: the tests below crash when actually executed; test lexing only by using noexec +got=$(set +x; eval 'set --noexec; : $( typeset -a arr=((a $(( $( typeset -a barr=((a $(( 1 << 2 )) c) 1); echo 1 ) << $( typeset -a bazz=((a $(( 1 << 2 )) c) 1); echo 2 ) )) c) 1) )' 2>&1) || err_exit "issue 269 test 14 (got $(printf %q "$got"))" +got=$(set +x; eval 'set --noexec; r=$(typeset -C arr=( (a=ah b=beh c=si) 1 (e f g)));' 2>&1) || err_exit "issue 269 test 15 (got $(printf %q "$got"))" + # ====== exit $((Errors<125?Errors:125)) ```
hyenias commented 1 year ago

All hand testing passed using the yet another new patch above.

Good job! 2 years of endeavoring and so many paths taken to reach this fix. Thanks to @McDutchie for your entrenched persistence on this issue and bug resolution. And thanks to @atheik for helping light the way for both of us.

This is an old bug. With this fix, long standing confusion and grief have been cleared away. As reference, when I spot checked using ksh2020 it appears it had a fix.

McDutchie commented 1 year ago

Thanks for re-testing, @hyenias!

As reference, when I spot checked using ksh2020 it appears it had a fix.

I hadn't checked that yet. Indeed, it fixes the syntax error bug outside of command substitutions, but not within them. So I did a little research on ksh93-history. Version 2013-05-24 adds the exact same check for the lp->comp_assign flag as my latest patch does:

                            if(lp->lex.intest || lp->comp_assign)
                                return lp->token=c;

Nice to have it confirmed :)