ksh93 / ksh

ksh 93u+m: KornShell lives! | Latest release: https://github.com/ksh93/ksh/releases
Eclipse Public License 2.0
187 stars 31 forks source link

Backport __ from v- #557

Open ormaaj opened 2 years ago

ormaaj commented 2 years ago

__ has many uses for accessing things that you can't get to any other way. I can't name them all because I almost certainly haven't discovered all that it does. In some contexts it behaves as a self-reference as you would expect _ to be used for, while in others it has a "parent-of-self" function.

One interesting use is to access the current element during a replacement expansion:

(cmd)gentoo@ormaaj-laptop (45134) /home/ormaaj $ ( echo; for sh in ./bin/ksh{93,2.1}; do "$sh" /dev/fd/9 9<<\EOF9; echo; done )
typeset -a a=(zero one)
a[3]=three a[4]=four
printf '%s ' "${a[@]/~(E)^/${!__}=}"; echo
print -rv -- .sh.version
EOF9

a[0]=zero a[1]=one a[3]=three a[4]=four
Version ABIJM 93v- 2014-12-24

__=zero __=one __=three __=four
Version AJM 93u+m/1.1.0-alpha+e920f42e/MOD 2022-09-27

Examples of other places where __ does "something":

To name a few. It also has the usual assortment of bewildering bugs that obfuscate its intended purpose.

The origin of __ was a thread in which I pointed out that _ within a get/set discipline should reference the current object analogous to the way "properties" or "accessors" work in other languages. https://marc.info/?l=ast-users&m=137448362609866&w=4

Evidently this usage conflicted with other people's ideas about what it should do, so __ was added to refer to the parent object in contexts where hierarchical structures apply. It seems to have been extended for other uses, none of which are documented, of course.

EDIT: Later in that thread I commented that __ should work like python's super() or something. No clue what I was thinking there. That's nonsense unless ksh adopts a multiple inheritance scheme, which I wouldn't propose. :/

McDutchie commented 1 year ago

Here is a first attempt to backport it from ksh 93v- 2013-08-07. The changes all involve sh.oldnp and np->nvenv.

First backport attempt ```diff diff --git a/src/cmd/ksh93/include/name.h b/src/cmd/ksh93/include/name.h index 74684894f..91a89858b 100644 --- a/src/cmd/ksh93/include/name.h +++ b/src/cmd/ksh93/include/name.h @@ -84,6 +84,7 @@ struct Namref { Namval_t *np; Namval_t *table; + Namval_t *oldnp; Dt_t *root; char *sub; #if SHOPT_FIXEDARRAY @@ -153,6 +154,7 @@ struct Ufunction #define nv_reftree(n) ((n)->nvalue.nrp->root) #define nv_reftable(n) ((n)->nvalue.nrp->table) #define nv_refsub(n) ((n)->nvalue.nrp->sub) +#define nv_refoldnp(n) ((n)->nvalue.nrp->oldnp) #if SHOPT_FIXEDARRAY # define nv_refindex(n) ((n)->nvalue.nrp->curi) # define nv_refdimen(n) ((n)->nvalue.nrp->dim) @@ -206,6 +208,7 @@ extern void nv_outnode(Namval_t*,Sfio_t*, int, int); extern int nv_subsaved(Namval_t*, int); extern void nv_typename(Namval_t*, Sfio_t*); extern void nv_newtype(Namval_t*); +extern Namval_t *nv_typeparent(Namval_t*); extern int nv_istable(Namval_t*); extern size_t nv_datasize(Namval_t*, size_t*); extern Namfun_t *nv_mapchar(Namval_t*, const char*); diff --git a/src/cmd/ksh93/include/shell.h b/src/cmd/ksh93/include/shell.h index 7efad5109..b5b49a36a 100644 --- a/src/cmd/ksh93/include/shell.h +++ b/src/cmd/ksh93/include/shell.h @@ -285,6 +285,7 @@ struct Shell_s Namval_t *namespace; /* current active namespace */ Namval_t *last_table; /* last table used in last nv_open */ Namval_t *prev_table; /* previous table used in nv_open */ + Namval_t *oldnp; /* last valid parent node */ Sfio_t *outpool; /* output stream pool */ long timeout; /* read timeout */ unsigned int curenv; /* current subshell number */ diff --git a/src/cmd/ksh93/sh/name.c b/src/cmd/ksh93/sh/name.c index 580205a88..80b6d1c46 100644 --- a/src/cmd/ksh93/sh/name.c +++ b/src/cmd/ksh93/sh/name.c @@ -704,6 +704,18 @@ static char *stack_extend(const char *cname, char *cp, int n) return (char*)name; } +static Namval_t *nv_parentnode(Namval_t *np) +{ + Namval_t *mp = np; + if(nv_istable(np)) + return nv_parent(np); + if(mp=nv_typeparent(np)) + return mp; + if((mp= (Namval_t*)np->nvenv)) + return mp; + return np; +} + Namval_t *nv_create(const char *name, Dt_t *root, int flags, Namfun_t *dp) { char *sub=0, *cp=(char*)name, *sp, *xp; @@ -872,6 +884,9 @@ Namval_t *nv_create(const char *name, Dt_t *root, int flags, Namfun_t *dp) if(c) *sp = c; top = 0; + if(np && !nv_isattr(np,NV_MINIMAL) && sh.oldnp && !np->nvenv && sh.oldnp!=np) + np->nvenv = (char*)sh.oldnp; + sh.oldnp = np; if(isref) { #if SHOPT_FIXEDARRAY @@ -898,6 +913,8 @@ Namval_t *nv_create(const char *name, Dt_t *root, int flags, Namfun_t *dp) n = nv_refindex(np); dim = nv_refdimen(np); #endif /* SHOPT_FIXEDARRAY */ + if(sh.oldnp = nv_refoldnp(np)) + sh.oldnp = (Namval_t*)sh.oldnp->nvenv; np = nv_refnode(np); #if SHOPT_FIXEDARRAY if(n) @@ -1208,7 +1225,25 @@ Namval_t *nv_create(const char *name, Dt_t *root, int flags, Namfun_t *dp) return np; cp++; break; + case '_': + if(cp[1]=='_' && (cp[2]==0 || cp[2]=='.') && sh.oldnp) + { + cp += 2; + dp->last = cp; + nvcache.ok = 0; + sh.oldnp = np = nv_parentnode(sh.oldnp); + if(*cp==0) + return np; + sp = nv_name(np); + c = strlen(sp); + dp->nofree |= 1; + name = copystack(sp,cp,NULL); + cp = (char*)name+c+1; + break; + } + /* FALLTHROUGH */ default: + sh.oldnp = np; dp->last = cp; if((c = mbchar(cp)) && !isaletter(c)) return np; @@ -3384,6 +3419,7 @@ void nv_setref(Namval_t *np, Dt_t *hp, int flags) np->nvalue.nrp = sh_newof(0,struct Namref,1,sizeof(Dtlink_t)); np->nvalue.nrp->np = nq; np->nvalue.nrp->root = hp; + np->nvalue.nrp->oldnp = sh.oldnp; if(ep) { #if SHOPT_FIXEDARRAY @@ -3531,14 +3567,17 @@ char *nv_name(Namval_t *np) if(!nv_isattr(np,NV_MINIMAL|NV_EXPORT) && np->nvenv) { Namval_t *nq= sh.last_table, *mp= (Namval_t*)np->nvenv; - if(np==sh.last_table) - sh.last_table = 0; - if(nv_isarray(mp)) - sfprintf(sh.strbuf,"%s[%s]",nv_name(mp),np->nvname); - else - sfprintf(sh.strbuf,"%s.%s",nv_name(mp),np->nvname); - sh.last_table = nq; - return sfstruse(sh.strbuf); + if(!strchr(np->nvname,'.') || nv_type(mp) || (nv_isarray(mp) && (cp=nv_getsub(mp)) && strcmp(np->nvname,cp)==0)) + { + if(np==sh.last_table) + sh.last_table = 0; + if(nv_isarray(mp)) + sfprintf(sh.strbuf,"%s[%s]",nv_name(mp),np->nvname); + else + sfprintf(sh.strbuf,"%s.%s",nv_name(mp),np->nvname); + sh.last_table = nq; + return sfstruse(sh.strbuf); + } } if(nv_istable(np)) sh.last_table = nv_parent(np); diff --git a/src/cmd/ksh93/sh/nvtree.c b/src/cmd/ksh93/sh/nvtree.c index 3aca14e51..b42e12cda 100644 --- a/src/cmd/ksh93/sh/nvtree.c +++ b/src/cmd/ksh93/sh/nvtree.c @@ -704,6 +704,8 @@ static void outval(char *name, const char *vname, struct Walk *wp) if(fp = nv_stack(np,NULL)) free(fp); np->nvfun = 0; + if(!nv_isattr(np,NV_MINIMAL)) + np->nvenv = 0; return; } for(xp=fp->next; xp; xp = xp->next) @@ -1004,7 +1006,9 @@ static char *walk_tree(Namval_t *np, Namval_t *xp, int flags) sh.var_tree = dp; if(nq && mq) { + char *nvenv = mq->nvenv; nv_clone(nq,mq,flags|NV_RAW); + mq->nvenv = nvenv; if(flags&NV_MOVE) nv_delete(nq,walk.root,0); } diff --git a/src/cmd/ksh93/sh/nvtype.c b/src/cmd/ksh93/sh/nvtype.c index 56db5998a..21e1db54d 100644 --- a/src/cmd/ksh93/sh/nvtype.c +++ b/src/cmd/ksh93/sh/nvtype.c @@ -447,9 +447,12 @@ static Namfun_t *clone_type(Namval_t* np, Namval_t *mp, int flags, Namfun_t *fp) nv_addnode(nr,1); if(pp->strsize<0) continue; - _nv_unset(nr,0); - if(!nv_isattr(nr,NV_MINIMAL)) - nv_delete(nr,sh.last_root,0); + if(nr != (Namval_t*)np->nvenv) + { + _nv_unset(nr,0); + if(!nv_isattr(nr,NV_MINIMAL)) + nv_delete(nr,sh.last_root,0); + } } else if(nv_isattr(nq,NV_RDONLY) && !nq->nvalue.cp && !nv_isattr(nq,NV_INTEGER)) { @@ -1509,3 +1512,14 @@ int sh_outtype(Sfio_t *out) dtinsert(sh.var_base,L_ARGNOD); return 0; } + +Namval_t *nv_typeparent(Namval_t *np) +{ + Namchld_t *fp; + Namtype_t *tp; + if(fp = (Namchld_t*)nv_hasdisc(np,&chtype_disc)) + return fp->ptype->parent; + if(tp = (Namtype_t*)nv_hasdisc(np,&type_disc)) + return tp->parent; + return 0; +} diff --git a/src/cmd/ksh93/sh/subshell.c b/src/cmd/ksh93/sh/subshell.c index 25e92cdee..3e216bf08 100644 --- a/src/cmd/ksh93/sh/subshell.c +++ b/src/cmd/ksh93/sh/subshell.c @@ -349,7 +349,7 @@ static void nv_restore(struct subshell *sp) } nv_setsize(mp,nv_size(np)); if(!(flags&NV_MINIMAL)) - mp->nvenv = np->nvenv; + mp->nvenv = nv_isnull(mp) ? 0 : np->nvenv; mp->nvfun = np->nvfun; if(np->nvfun && nofree) np->nvfun->nofree = nofree; ```

The patch makes your example script work:

typeset -a a=(zero one)
a[3]=three a[4]=four
printf '%s ' "${a[@]/~(E)^/${!__}=}"; echo

Output before patch: __=zero __=one __=three __=four. Output after patch: a[0]=zero a[1]=one a[3]=three a[4]=four.

However, the patch also causes a few regression test failures:

test pointtype begins at 2023-04-02+13:49:30
    pointtype.sh[109]: FAIL: expansion of associative array of types is incorrect
    pointtype.sh[111]: FAIL: typeset -p z for associative of types is incorrect
    pointtype.sh[132]: FAIL: expansion of type containing indexed array of types is incorrect
test pointtype failed at 2023-04-02+13:49:30 with exit code 3 [ 36 tests 3 errors ]
test types begins at 2023-04-02+13:50:29
/usr/local/src/ksh93/ksh/src/cmd/ksh93/tests/types.sh[33]: Type_t: r: is read only
test types failed at 2023-04-02+13:50:29 with exit code 1 [ 100 tests 1 error ]

Of course I may well have missed something while backporting this, so please do verify the patch against the 93v- commit linked above.

There should also be a fix for it in ksh 93v- 2013-08-29 ("Fixed __ to work in nested types") but I haven't managed to isolated that from all the other changes in that commit yet. I suspect the fix involves an (ab)use of the NV_EXPORT attribute. It's so unfortunate they never kept changes in distinct commits :/

McDutchie commented 1 year ago

Having said all that, I'm not convinced this is worth integrating into ksh 93u+m. I don't think an interesting hack is worth it when it comes with "the usual assortment of bewildering bugs that obfuscate its intended purpose"; we prioritise fixing bugs here, so I'd like to avoid introducing any where possible.

In the example: printf '%s ' "${a[@]/~(E)^/${!__}=}" (which, as of ceae1e44c8ffedb9dfc2fc4eb3432438991717f8 on the dev branch, can also be written as printf '%s ' "${a[@]/#/${!__}=}"), the ${!__} expansion does not have a constant value; it re-expands differently for each iteration of the vector expansion. This violates a basic principle of shell grammar, so I'm not a fan of that at all. There is a much cleaner way to achieve the same thing:

typeset -a a=(zero one)
a[3]=three a[4]=four
for i in "${!a[@]}"
do  printf 'a[%d]=%s ' i "${a[i]}"
done; echo

…and that works on every ksh93, and makes for much more legible and understandable code.

As for the use in types, in the mailing list thread, you mentioned a workaround that you called "nasty": nameref this=${.sh.name%.*}; .sh.value=${this.y};. Two questions:

  1. Why is that nasty?
  2. Is it nastier than the 93v- __ hack?

I could be convinced to backport __, and try to fix it, if you can present an actual, concrete use case that is not feasible to implement in ksh as it is.

McDutchie commented 1 year ago

the ${!__} expansion does not have a constant value; it re-expands differently for each iteration of the vector expansion. This violates a basic principle of shell grammar, so I'm not a fan of that at all.

…on the other hand, that turns out to be a pre-existing quirk of ksh, and not something introduced by this patch.

$ ksh -c 'echo ${@/~(E)^/$((++i))}' _ one two three four
1one 2two 3three 4four

whereas:

$ bash -c 'echo ${@/#/$((++i))}' _ one two three four
1one 1two 1three 1four
$ zsh -c 'echo ${@/#/$((++i))}' _ one two three four 
1one 1two 1three 1four

I would consider the ksh93 behaviour to be clearly a bug, but there are probably scripts that depend on it. :-/

ormaaj commented 1 year ago

the ${!__} expansion does not have a constant value; it re-expands differently for each iteration of the vector expansion. This violates a basic principle of shell grammar, so I'm not a fan of that at all.

…on the other hand, that turns out to be a pre-existing quirk of ksh, and not something introduced by this patch.

$ ksh -c 'echo ${@/~(E)^/$((++i))}' _ one two three four
1one 2two 3three 4four

whereas:

$ bash -c 'echo ${@/#/$((++i))}' _ one two three four
1one 1two 1three 1four
$ zsh -c 'echo ${@/#/$((++i))}' _ one two three four 
1one 1two 1three 1four

I would consider the ksh93 behaviour to be clearly a bug, but there are probably scripts that depend on it. :-/

It's definitely a feature. The other shells just chose not to implement it for whatever reason. It's required in order to access sub-matches from .sh.match that aren't accessible through ordinary group references, among other uses. There was at one time a bug in which the parameter expansions quit working and a fix was required to restore that behavior.

ormaaj commented 1 year ago

…and that works on every ksh93, and makes for much more legible and understandable code.

As for the use in types, in the mailing list thread, you mentioned a workaround that you called "nasty": nameref this=${.sh.name%.*}; .sh.value=${this.y};. Two questions:

1. Why is that nasty?

2. Is it nastier than the 93v- `__` hack?

I could be convinced to backport __, and try to fix it, if you can present an actual, concrete use case that is not feasible to implement in ksh as it is.

I did not (and still don't) understand Irek and other's objections in that thread. I created that thread due to the misbehavior of _ and I'm pretty sure that must have been some oversight because I can't think of any reason for _ to not behave as I proposed (and DGK agreed and fixed it).

From that thread...

This can't work. .sh.value is identical to _

Is that even true. If so why? Why _ would need to be a synonym for .sh.value.

Think about ksh93 types as compound variables with predefined discipline functions. In your example you're trying to assign the value of ++_.y to the value of a compound variable - which obviously cannot work.

I have no clue what was meant here. A type is nothing like a compound variable other than superficially borrowing bits of the compound assignment grammar for their definitions. Predefined discipline functions? What? I didn't even use a compound variable in that example and _.y definitely doesn't refer to one there.

What about . to access the parent in both types and compound variable trees?

What on earth is a type tree? Why would a type have a parent? Compound variables sure.

I think the _ bug that was the topic there and the introduction of __ are totally separate issues. __ is only really interesting for its new functionality. It does make sense to have a way to access the current variable with _ in the case of compounds and the parent through __. With regard to types I don't think it was ever discussed what role __ would have either as a member variable or in the context of a type's functions.