ksh93 / ksh

ksh 93u+m: KornShell lives! | Latest release: https://github.com/ksh93/ksh/releases
Eclipse Public License 2.0
192 stars 32 forks source link

wrong typeset -p output after unsetting multidimensional array elements #148

Open stephane-chazelas opened 4 years ago

stephane-chazelas commented 4 years ago
$ a=((a b)(c d))
$ unset 'a[0][0]'
$ typeset -p a
typeset -a a=(([1]=b) (c d) )

(fine)

$ unset 'a[0][1]'
$ typeset -p a
typeset -a a=(([0]=) (c d) )

Expected:

typeset -a a=(() (c d) )

(though the () is ambiguous as it could represent an array, associative array or compound and may explain why typeset -p shows that [0]= output, see below).

All other indicators suggest ${a[0]} is an empty list:

$ echo "${!a[0][@]}"

$ echo "${a[0][0]+set}"

$ echo "${#a[0][@]}"
0

It seems that the type of the elements of the array is determined by the syntax of the assignment.

In:

a=( (a) ([0]=a) () )

The first element is an array, the second an associative array, the third a compound.

Now, if I do:

a[0][5]=x
a[1][5]=x

typeset -p a outputs:

$ typeset -p a
typeset -a a=(([0]=a [5]=x) ([0]=a [5]=x) ())

a[0] and a[1] are indistinguishable even though

$ typeset -p 'a[0]' 'a[1]'
typeset -a a[0]=([0]=a [5]=x)
typeset -A a[1]=([0]=a [5]=x)

a[0] is still an array and a[1] still an associative array. The output of typeset -p cannot be used to recreate $a as is.

The whole thing seems very buggy:

$ a=((a b) (c))
$ a[1].x=y
$ typeset -p a
typeset -a a=((a b) (c) )
$ echo ${a[1].x}
y
$ echo ${a[1]}
c

Is a[1] both a compound and array?

$ a+=([1]=([blah]=x))
$ typeset -p a
typeset -A a=([0]='' [1]=([blah]=x) )

a[0] was changed from an array to an empty scalar?

$ echo ${a[1].x}
y

(still there)

$ unset a
$ typeset -p a
$ echo ${a[1].x}
y

(still there)

$ a[1].y=...
$ typeset -p a
typeset -a a=([1]=(x=y;y=...))

a[1].x now back in typeset -p output.

$ typeset -a a[0]=()
$ typeset -p a
typeset -a a=(;) (x=y;y=...))

(see the unmatched ")")

I could carry on like that.

hyenias commented 4 years ago

Wow! You have pointed out or come across a multitude of issues and/or problems. I have just finished making my way to the end of your notes and I have a slight headache now. In order for me to understand what was going on and to lessen the burden on my eyes, I had to rewrite your script snippets using more distinguishable variables and values. I have made lots of notes and will provide them to you later. I just wanted to respond to you so that you know someone has seen your issue and completed initial assessments of this open issue. I do appreciate you for bringing these issues and/or problems to our attention.

While attempting to understand the intent of some of your lines of code, I have come across more bugs that you did not directly indicate but possibly alluded to their existence. The following are the overall summary bullets items from my notes:

As you are an advanced shell script user, I now provide the following sneak preview of commands that I used for additional insights:

$ unset arr; arr=( (a) ([0]=b) () )
$ typeset -p ${ printf 'arr[%s] ' ${!arr[@]}; }
typeset -a arr[0]=(a)
typeset -A arr[1]=([0]=b)
typeset -C arr[2]=()
$ typeset -p | grep -e '^arr' -e 'arr='
typeset -a arr=((a b) (c) )
arr[1].x=y
hyenias commented 4 years ago

Summary

This issue points out some deficiencies in multidimensional array handling, assignments, and formatting. A mixing of various container types (indexed array, associative array, compound variable) along with various means of defining, appending, redefining, or unsetting multidimensional array elements has resulted in unexpected typeset -p representations that are unclear and/or cannot be directly used to recreate the multidimensional array via the resulting declarative statement (compound assignment form). This is primarily due to how ksh is expressing a null or empty container type within a multidimensional container.

Conclusion

The KornShell language needs to be updated to allow direct determination of a container type via text syntax to enable reusable declarative multidimensional statements. I offer the following potential multidimensional elemental syntax assuming ksh's parsing engines can handle them:

Code Review of @stephane-chazelas issue

It seems that the type of the elements of the array is determined by the syntax of the assignment.

Yes for the most part. As only a compound variable allows for individual variable attributes to be set via a corresponding typeset statement, ksh does attempt to infer what an element is by parsing the given assignment syntax.

(though the () is ambiguous as it could represent an array, associative array or compound and may explain why typeset -p shows that [0]= output, see below).

Currently in ksh, () represents an empty compound variable. ksh93 defaults () to represent a compound variable not an indexed array. Yes, the [0]= is how ksh is expressing an empty indexed array within a multidimensional statement.

In: a=( (a) ([0]=a) () ) The first element is an array, the second an associative array, the third a compound.

Correct: Indexed array, associative array, compound variable.

$ a=( (a) ([0]=a) () )
$ typeset -p ${ printf 'a[%s] ' ${!a[@]}; }
typeset -a a[0]=(a)
typeset -A a[1]=([0]=a)
typeset -C a[2]=()

typeset -p a outputs: $ typeset -p a typeset -a a=(([0]=a [5]=x) ([0]=a [5]=x) ()) a[0] and a[1] are indistinguishable even though $ typeset -p 'a[0]' 'a[1]' typeset -a a[0]=([0]=a [5]=x) typeset -A a[1]=([0]=a [5]=x) a[0] is still an array and a[1] still an associative array. The output of typeset -p cannot be used to recreate $a as is.

Correct. As you have ended up with a sparse indexed array, ksh then indicates each defined element using the same subscript syntax as an associative array. ksh would need to possess the typeset attribute information to distinguish the array as indexed instead of associative. ksh defaults to interpreting array assignments having a [...]= element as being an associative array--as a normal zero based continuous indexed array would not them ([#]= ).

The whole thing seems very buggy: $ a=((a b) (c)) $ a[1].x=y $ typeset -p a typeset -a a=((a b) (c) ) $ echo ${a[1].x} y $ echo ${a[1]} c

Understandable as you have attempted to assign a compound subvariable to an existing indexed array of (c). The .x subvariable has no parent compound variable to bind itself to and thus is orphaned into the namespace. Please see KSH93 Compound Variables for more details.

$ typeset -p | grep -e '^arr' -e 'arr='
typeset -a arr=((a b) (c) )
arr[1].x=y

Is a[1] both a compound and array? $ a+=([1]=([blah]=x)) $ typeset -p a typeset -A a=([0]='' [1]=([blah]=x) )

No, a[1] is now an associative array. You have attempted to append an associative array to an existing multidimensional indexed array. Apparently, ksh's multidimensional array logic needs to improve to account for this. Currently, the results of this allowed append causes the variable a to be redefined as an associative array having a empty string for subscript 0.

$ unset arr; arr=( {0..2} ); typeset -p arr; arr+=([1]=([blah]=x)); typeset -p arr
typeset -a arr=(0 1 2)
-ksh: cannot append index array to associative array arr
$ unset arr; arr=( (a b) (c) ); typeset -p arr; arr+=([1]=([blah]=x)); typeset -p arr
typeset -a arr=((a b) (c) )
typeset -A arr=([0]='' [1]=([blah]=x) )

a[0] was changed from an array to an empty scalar?

Yes, a[0] having been an indexed array was destroyed and recreated as an empty scalar due to attempting to append an associative array to an indexed array.

$ echo ${a[1].x} y (still there)

Correct. a[1].x still exists orphaned in the namespace and can be directly referenced as you have done.

$ unset a $ typeset -p a $ echo ${a[1].x} y (still there)

Yes, the orphaned subvariable still exists without a parent. You have removed the variable associated array a from the namespace.

$ a[1].y=... $ typeset -p a typeset -a a=([1]=(x=y;y=...)) a[1].x now back in typeset -p output.

.x subvariable is now back, as you put it, because ksh was able to create a new indexed array with a valid parent compound variable as a result of your a[1].y=... assignment.

$ unset a; a[1].y=...; typeset -p a a[1]
typeset -a a=([1]=(y=...))
typeset -C a[1]=(y=...)

$ typeset -a a[0]=() $ typeset -p a typeset -a a=(;) (x=y;y=...)) (see the unmatched ")") I could carry on like that.

This is new to me. Apparently, ksh is expressing a null (may empty) indexed array placeholder as ;) in the typeset -p output.

$ unset a; a[1].y=...; a[0]=(); typeset -p a a[0] a[1]
typeset -a a=(() (y=...))
typeset -C a[0]=()
typeset -C a[1]=(y=...)
$ unset a; a[1].y=...; typeset -a a[0]=(); typeset -p a a[0] a[1]
typeset -a a=(;) (y=...))
typeset -a a[0]
typeset -C a[1]=(y=...)
$
$ unset container; typeset -a container; typeset -p container
typeset -a container
$ unset container; typeset -A container; typeset -p container
typeset -A container=()
$ unset container; typeset -C container; typeset -p container
typeset -C container=()
ormaaj commented 3 years ago

Yeah there are a ton of "bugs" like these (or just very quirky nonsensical behaviour). For more fun experiment with self-referential and recursive compounds that refer to themselves through namerefs and nameref elements. I found so many oddities to do with container types relating to how they're defined and exact syntax. Will have to hunt around to rediscover some of those :D

hyenias commented 3 years ago

Update, I continue to research this.

hyenias commented 2 years ago

Thanks to @JohnoKing for backporting a ksh93v- fix for the [0]= is how ksh is expressing an empty indexed array within a multidimensional statement. With #451, @stephane-chazelas your very first problem has been addressed.

stephane-chazelas commented 8 months ago

Thanks to @JohnoKing for backporting a ksh93v- fix for the [0]= is how ksh is expressing an empty indexed array within a multidimensional statement. With #451, @stephane-chazelas your very first problem has been addressed.

Thanks,

$ a=((a b)(c d))
$ unset 'a[0][0]'
$ typeset -p a
typeset -a a=(([1]=b) (c d) )
$ unset 'a[0][1]'
$ typeset -p a
typeset -a a=(() (c d) )

Looks somewhat better, but note that that typeset -a a=(() (c d) ) would still not recreate the array as-is.

$ typeset -p 'a[0]'
typeset -a a[0]=()
$ echo "${#a[0][@]}"
0

That's an empty array, but if we use the output of typeset -p:

$ ksh -c 'typeset -a a=(() (c d) ); typeset -p "a[0]"; echo "${#a[0][@]}"'
typeset -C a[0]=()
1

That's a compound.

We're missing assignment syntax for creating an empty regular array (or empty assoc for that matters), or we'd need typeset -p a to output lines several lines like:

typeset -a a=()
typeset -a a[0]=()
typeset -a a[1]=(b c)

For that array.

Though running that code seems to produce yet something different for which the output of typeset -p is also bogus:

$ typeset -a a=()
$ typeset -a a[0]=()
$ typeset -a a[1]=(b c)
$ typeset -p a
typeset -a a=(;) (b c) )