ksh93 / ksh

ksh 93u+m: KornShell lives! | Latest release: https://github.com/ksh93/ksh/releases
Eclipse Public License 2.0
183 stars 31 forks source link

read -p #463

Closed fbrau closed 2 years ago

fbrau commented 2 years ago

old ksh did a prompt message before reading console. So does bash. It now does "Read from the current co-process". Comparing bash and ksh (yours) flags they do use the same flags mostly

I suggest to:

1 change that "co-process" reading to another flag (-P?) and 2 can we keep the old -p (PROMPT) behaviour?

McDutchie commented 2 years ago

old ksh did a prompt message before reading console. So does bash.

Well, the abandoned ksh 93v- beta and ksh2020 versions do that. But no AT&T release version ever did. The last AT&T release, 93u+ 2012-08-01, is like ksh 93u+m.

Note that we do support a prompt, just with a different syntax:

$ read var"?prompt: "
prompt: ▁

It is quite ugly. I suppose that's why AT&T was going to add the bash-style option to the next release. We could backport that change. The co-process option would become -u p.

But at the same time, to avoid breaking old ksh scripts, in 93v- and ksh2020, -p can still be used as the co-process option f it is followed by a valid variable name while a co-process is active.

I suppose it's very rare for a prompt to be a valid variable name, but still, it means arbitrary prompts cannot be safely used. So the new way is also ugly…

fbrau commented 2 years ago

Well, the var?prompt syntax doesnt look that bad. I didn't know about it. I vote for that instead

McDutchie commented 2 years ago

I think that syntax is dreadful, actually – it's not just ugly, it's unsafe. To use it safely you have to quote the ? to protect it from pathname expansion, otherwise, for read foo?bar, you risk actually executing read fooXbar if a file named fooXbar exists in the current directory (with any character for X). And most people will not quote it in practice.

Plus, more ksh users have now come to expect to use -p for prompts because ksh2020 supported this.

So I think all in all we're probably better off backporting the change.

McDutchie commented 2 years ago

Here is a patch that backports the -a, -p and -u changes in read from 93v- and ksh2020 with a few tweaks. @JohnoKing, @hyenias, could you test?

Patch ```diff diff --git a/src/cmd/ksh93/bltins/read.c b/src/cmd/ksh93/bltins/read.c index 73db877a5..9300736d0 100644 --- a/src/cmd/ksh93/bltins/read.c +++ b/src/cmd/ksh93/bltins/read.c @@ -62,7 +62,7 @@ struct read_save int b_read(int argc,char *argv[], Shbltin_t *context) { Sfdouble_t sec; - register char *name; + register char *name=0; /* prompt */ register int r, flags=0, fd=0; ssize_t len=0; long timeout = 1000*sh.st.tmout; @@ -107,10 +107,15 @@ int b_read(int argc,char *argv[], Shbltin_t *context) } break; case 'p': - if((fd = sh.cpipe[0])<=0) + /* prompt, unless a coprocess is active and optarg is a varname */ + if(sh.cpipe[0]<=0 || *opt_info.arg!='-' && (!strmatch(opt_info.arg,"+(\\w)") || isdigit(*opt_info.arg))) + name = opt_info.arg; + else { - errormsg(SH_DICT,ERROR_exit(1),e_query); - UNREACHABLE(); + /* for backward compatibility */ + fd = sh.cpipe[0]; + argv--; + argc--; } break; case 'n': case 'N': @@ -129,20 +134,28 @@ int b_read(int argc,char *argv[], Shbltin_t *context) flags |= SS_FLAG; break; case 'u': - fd = (int)opt_info.num; - if(opt_info.num<0 || opt_info.num>INT_MAX || (fd>=sh.lim.open_max && !sh_iovalidfd(fd))) + if(opt_info.arg[0]=='p' && opt_info.arg[1]==0) { - errormsg(SH_DICT,ERROR_exit(1),e_file,opt_info.arg); /* reject invalid file descriptors */ - UNREACHABLE(); + if((fd = sh.cpipe[0])<=0) + { + errormsg(SH_DICT,ERROR_exit(1),e_query); + UNREACHABLE(); + } + break; } - if(sh_inuse(fd)) + fd = (int)strtol(opt_info.arg,&opt_info.arg,10); + if(*opt_info.arg || !sh_iovalidfd(fd)) fd = -1; break; case 'v': flags |= V_FLAG; break; case ':': - errormsg(SH_DICT,2, "%s", opt_info.arg); + /* backward compat: if 'p' lacks option-argument and there is a coprocess, treat like 'read -u p' */ + if(sh.cpipe[0]>0 && strcmp(opt_info.name,"-p")==0) + fd = sh.cpipe[0]; + else + errormsg(SH_DICT,2, "%s", opt_info.arg); break; case '?': errormsg(SH_DICT,ERROR_usage(2), "%s", opt_info.arg); @@ -162,8 +175,10 @@ int b_read(int argc,char *argv[], Shbltin_t *context) UNREACHABLE(); } /* look for prompt */ - if((name = *argv) && (name=strchr(name,'?')) && (r&IOTTY)) - r = strlen(name++); + if(!name && *argv && (name=strchr(*argv,'?'))) + name++; + if(name && (r&IOTTY)) + r = strlen(name)+1; else r = 0; if(argc==fixargs) diff --git a/src/cmd/ksh93/data/builtins.c b/src/cmd/ksh93/data/builtins.c index 2b6d313c5..05c9cde93 100644 --- a/src/cmd/ksh93/data/builtins.c +++ b/src/cmd/ksh93/data/builtins.c @@ -1404,7 +1404,7 @@ const char sh_optpwd[] = ; const char sh_optread[] = -"[-1c?\n@(#)$Id: read (AT&T Research) 2006-12-19 $\n]" +"[-1c?\n@(#)$Id: read (ksh 93u+m) 2022-02-15 $\n]" "[--catalog?" SH_DICT "]" "[+NAME?read - read a line from standard input]" "[+DESCRIPTION?\bread\b reads a line from standard input and breaks it " @@ -1419,22 +1419,28 @@ const char sh_optread[] = "\bREPLY\b is used.]" "[+?When \avar\a has the binary attribute and \b-n\b or \b-N\b is specified, " "the bytes that are read are stored directly into \bvar\b.]" -"[+?If you specify \b?\b\aprompt\a after the first \avar\a, then \bread\b " - "will display \aprompt\a on standard error when standard input " - "is a terminal or pipe.]" +"[+?If you append \b?\b\aprompt\a to the first \avar\a, then \bread\b " + "will display \aprompt\a as in the \b-p\b option (see below). " + "The \b?\b should be quoted to protect it from pathname expansion. " + "This usage is deprecated.]" "[+?If an end of file is encountered while reading a line the data is " "read and processed but \bread\b returns with a non-zero exit status.]" -"[A?Unset \avar\a and then create an indexed array containing each field in " +"[A|a?Unset \avar\a and then create an indexed array containing each field in " "the line starting at index 0.]" "[C?Unset \avar\a and read \avar\a as a compound variable.]" "[d]:[delim?Read until delimiter \adelim\a instead of to the end of line.]" -"[p?Read from the current co-process instead of standard input. An end of " - "file causes \bread\b to disconnect the co-process so that another " - "can be created.]" +"[p]:[prompt?Write \aprompt\a on standard error before reading " + "if standard input is a terminal or pipe. In earlier releases, " + "\b-p\b caused the input to come from the current co-process. " + "Use \b-u p\b instead. For backward compatibility, " + "if there is a co-process active and \aprompt\a is a valid identifier, " + "then \aprompt\a will be treated as an argument and it will read from " + "the co-process instead of using \aprompt\a as the prompt.]" "[r?Do not treat \b\\\b specially when processing the input line.]" "[s?Save a copy of the input as an entry in the shell history file.]" "[S?Treat the input as if it was saved from a spreadsheet in csv format.]" -"[u]#[fd:=0?Read from file descriptor number \afd\a instead of standard input.]" +"[u]:[fd:=0?Read from file descriptor number \afd\a instead of standard input. " + "If \afd\a is \bp\b, the co-process input file descriptor is used.]" "[t]:[timeout?Specify a timeout \atimeout\a in seconds when reading from " "a terminal or pipe.]" "[n]#[count?Read at most \acount\a characters. For binary fields \acount\a " ```
hyenias commented 2 years ago

Almost. See below:

$ arch/*/bin/ksh -c '{ sleep 2; echo success; }|&; read -u p var"?input:"; echo response: ${var:-${REPLY}}'
response: success
$ arch/*/bin/ksh -c '{ sleep 2; echo success; }|&; read -u 3 var"?input:"; echo response: ${var:-${REPLY}}'
response: success
$ arch/*/bin/ksh -c '{ sleep 2; echo success; }|&; read -p var"?input:"; echo response: ${var:-${REPLY}}'
var?input:bad
response: bad
$ ksh -c '{ sleep 2; echo success; }|&; read -p var"?input:"; echo response: ${var:-${REPLY}}'
response: success
McDutchie commented 2 years ago

Thanks @hyenias. The third backward compat case is broken on ksh2020, too. When checking for a valid variable name after -p, it fails to exclude a ? prompt. And I found another problem: it fails to consider variable names like .sh.foo valid as well.

McDutchie commented 2 years ago

Regression test for builtins.sh:

# ksh 93v- did not implement co-process compatibility for 'read -p .sh.foo' or 'read -p varname?prompt'
(echo ok |& read -p .sh.foo"?dummy prompt: " && [[ ${.sh.foo} == ok ]]) </dev/null \
|| err_exit "read -p backwards compat for co-process does not support dot variables or ? prompts"

Matching a valid ksh veriable name including dots (but not any double dots nor a final dot) using shell patterns or regular expressions is challenging, to say the least. I've come up with the following (replace the 'p' case code with this):

        case 'p':
        /* prompt, unless a coprocess is active and option-argument is a vname with optional legacy ?prompt attached */
        if(sh.cpipe[0]>0
        && strmatch(opt_info.arg,"~(E)^[[:alpha:]_.]+[[:alnum:]_.]*(\\?.*)?$")  /* match all vname chars before optional ?prompt */
        && !strmatch(opt_info.arg,"~(E)^.*\\.\\..*(\\?.*)?$")           /* exclude double dot before optional ?prompt */
        && !strmatch(opt_info.arg,"~(E)^.*\\.(\\?.*)?$"))           /* exclude final dot before optional ?prompt */
        {
            /* backward compat: read from co-process and abandon option-argument, treating it as regular argument */
            fd = sh.cpipe[0];
            argv--;
            argc--;
        }
        else
            name = opt_info.arg;
        break;
McDutchie commented 2 years ago

Actually, I'm not happy with that regex approach. Inevitably, there will be corner cases where its results and that of nv_open() with the NV_VARNAME flag will differ. One such case is the "no parent" error that happens if you try to use .foo.bar without .foo existing.

So here's a different approach. Temporarily get rid of the ?prompt (if any) and then simply try nv_open() to see if it will return a pointer. The side effect is that it will add a node for a valid variable name that doesn't exist yet, but that's not a problem as we're about to read into that variable anyway.

        case 'p':
        /* prompt, unless a coprocess is active and option-argument
         * is a vname with optional legacy ?prompt attached */
        if(sh.cpipe[0]>0)
        {
            char *qmark = strchr(opt_info.arg,'?');
            if(qmark)
                *qmark = '\0';
            if(nv_open(opt_info.arg,sh.var_tree,NV_VARNAME|NV_NOFAIL))
            {
                /* backward compat: read from co-process and abandon
                 * option-argument, treating it as regular argument */
                fd = sh.cpipe[0];
                argv--;
                argc--;
            }
            if(qmark)
                *qmark = '?';
            break;
        }
        name = opt_info.arg;
        break;
McDutchie commented 2 years ago

I wonder if my comment is correct:

                /* backward compat: read from co-process and abandon
                 * option-argument, treating it as regular argument */

I know this is supposed to be the effect but in fact I don't understand what's happening. What kind of magic is this argv--; argc--; thing that we're backporting from 93v-? How does that work? It looks like it should get optget(3) to abandon the the option-argument altogether. How does this get it to treat it like a regular vname argument instead?

McDutchie commented 2 years ago

I think I figured it out. That comment of mine was indeed wrong. It does not abandon (skip past) the option-argument, it does the opposite; it causes optget(3) to find the option-argument again, parsing it as a separate variable name argument.

But that only works if it is separated from -p as in -p varname?prompt. It will not work if we stack it to the option as in -pvarname?prompt -- but that didn't work in 93u+ either. So we need an additional check that the option-argument is not stacked to the option. This can be done by checking if the parsed option-argument is identical to the last-processed argv[] string.

        case 'p':
        /* if a coprocess is active, option-argument is a separate word (not stacked to -p),
         * and option-argument is a valid vname with optional legacy ?prompt attached... */
        if(sh.cpipe[0]>0 && strcmp(opt_info.arg,argv[opt_info.index-1])==0)
        {
            char *qmark = strchr(opt_info.arg,'?');
            if(qmark)
                *qmark = '\0';
            if(nv_open(opt_info.arg,sh.var_tree,NV_VARNAME|NV_NOFAIL))
            {
                /* backward compat: read from co-process and re-parse
                 * the option-argument as a separate vname argument */
                fd = sh.cpipe[0];
                argv--;
                argc--;
            }
            if(qmark)
                *qmark = '?';
            break;
        }
        /* prompt */
        name = opt_info.arg;
        break;
McDutchie commented 2 years ago

So I think this is about ready now.

In the actual commit I'd like to rename the name variable, which is a pointer to the prompt string, to prompt. Currently it looks quite misleading to me, as name makes me think variable name.

McDutchie commented 2 years ago

Sorry folks, I hit git push too soon, it's still broken. https://github.com/ksh93/ksh/commit/2bc2d38e9446cd67a3036107c11a625c9dc6764a is force-pushed back out.

McDutchie commented 2 years ago

I had failed to notice these regression test failures because they weren't counted:

test coprocess begins at 2022-02-16+03:04:49
/usr/local/src/ksh93/ksh/src/cmd/ksh93/tests/coprocess.sh[127]: read: 1: invalid variable name
/usr/local/src/ksh93/ksh/src/cmd/ksh93/tests/coprocess.sh[127]: read: 1: invalid variable name
test coprocess passed at 2022-02-16+03:04:52 [ 33 tests 0 errors ]

This is the failing command: https://github.com/ksh93/ksh/blob/f7417f777d8d875e2d2b2220043c7774fbbbae6f/src/cmd/ksh93/tests/coprocess.sh#L127

McDutchie commented 2 years ago

Ah ha… in the 93v- code there is a check for an initial - which I had dismissed as irrelevant: https://github.com/ksh93/ast-open-archive/blob/be2ec79f78ce7b37d0b0a3ea513a34d733cf896b/src/cmd/ksh93/bltins/read.c#L258

But of course... if opt_info.arg=='-', then -p is followed by another option, like in the failing command below. That too needs to trigger the compatibility handilng.

New version of the 'p' case. This time, all the tests pass for realz. :-P

        case 'p':
        /* if a co-process is active, option-argument is a separate word (not stacked
         * to -p), and option-argument either looks like an option (starts with '-')
         * or is a valid vname with optional legacy ?prompt attached... */
        if(sh.cpipe[0]>0 && strcmp(opt_info.arg,argv[opt_info.index-1])==0)
        {
            char *qmark = strchr(opt_info.arg,'?');
            if(qmark)
                *qmark = '\0';
            if(*opt_info.arg=='-' || nv_open(opt_info.arg,sh.var_tree,NV_VARNAME|NV_NOFAIL))
            {
                /* backward compat: read from co-process and re-parse
                 * the option-argument as if it were a separate argument */
                fd = sh.cpipe[0];
                argv--;
                argc--;
            }
            if(qmark)
                *qmark = '?';
            break;
        }
        /* new '-p promptstring' usage (deprecates the vname?prompt syntax) */
        prompt = opt_info.arg;
        break;
McDutchie commented 2 years ago

Unfortunately, I found another thing that breaks: stacking -p with other options.

93u+:

$ /bin/ksh -c 'ls |& read -pt1; echo $REPLY'
0001-dtksh-Upgrade-ksh93-to-1.0.0-beta.2-2021-12-28.patch.xz

Current patch:

$ arch/*/bin/ksh -c 'ls |& read -pt1; echo $REPLY'
t1▁

This breaks on ksh2020 too, although differently (it seems to hang).

In any case, stacking the -p co-process option must work as in 93u+ or we can't commit the change. There will almost certainly be scripts that depend on this.

McDutchie commented 2 years ago

The 93v-/2020 code hangs for a stacked -pt1 argument because the argv-- keeps it reparsing the stacked argument in an infinite loop. So we need to find a way to test for that to avoid replicating that hang.

McDutchie commented 2 years ago

Alright, let's take a step back and reconsider.

Now that stacked options (like read -pt1) have entered consideration, this already-questionable hack just became actually insane. I don't think it's possible to make it work correctly. I've been trying, and what I get is either the same infinite option parsing loop as in ksh2020, or incorrect behaviour.

The optget(3) opt_info struct doesn't seem to expose the information we need to check if we're processing a stacked option or not. And why would it? The only reason you would want to know is that you're trying to write some ugly hack that shouldn't exist.

The ksh -p option and the bash -p option conflict. They're irreconcilable. We cannot have both in the same command without introducing breakage. Even if it were theoretically possible, the hack would become too complex and opaque.

Renaming -p to -P to make way for the bash option is not acceptable either because our priority has to be to maintain compatibility with ksh 93u+, as the vast majority of ksh93 scripts target that.

So, ugly vname?prompt syntax it is, as @fbrau suggested in the first place. This is a wontfix. Sorry for the time waster, folks.

I'll still backport the -a and -u p equivalents to -A and -p (some ksh2020 scripts may use them and probably not all use -p) and commit my documentation improvements.