Arrays and maps: Members, entries, values, contents, pairs, …

With version 4.0, we are adding a lot of promising and powerful new map and array features. This is a big step forward, compared to the obvious limitations of 3.1.

Some aspects of the 3.1 design have made it difficult (or impossible) to fully adjust array and maps, but (in my opinion) the old overall concept was impressively consistent – and it is definitely a big challenge to achieve a 4.0 design that is not too fragmented.

To me, this becomes particularly evident in the case of arrays. The following example sums up the items of all members of an array. For the cumbersome 3.1 solution…

for $pos in 1 to array:size($array)
return sum($array($pos))

…we now have at least several (roughly?) equivalent options to do this; for example…

for member $m in $array return sum($m)
array:members($array) ! sum(?value)
$array?entry::* ! sum(?value)
$array?value::* ! sum(.)

…which is great – but the downside is that we have introduced a terminological jungle. The examples above could imply that:

for 1., an array member is a sequence (which it indeed is);
for 2., an array member is a map;
for 3., an array has entries (but there is no array:entries);
for 4., an array has values (which is true, but array:value returns a different structure).

Next, with the current proposals, $array:content::1 gives us the sequence-concatenated version of the first member of an array. Similar observations can be made with maps: map:entries($map) returns singleton maps, whereas $map?entry::* is actually equivalent to map:pairs.

The fundamental obstacle are clear have already been discussed a lot, but I think that with each new concept, we should try really hard not to blur terminology, and work with terms that users can assign to the underlying concepts without too much guessing or trial’n’error.

My general suggestions would be to…

align the new lookup terminology and the builtin functions, and
omit, rename or drop builtin functions that do not rely on the existing or arising terminology.

My concrete proposals (maybe followed by others) would be to…

As we already have map:pairs, $map-or-array?entry::* should become $map-or-array?pair::*, and we should add a array:pairs function, and probably array:of-pairs (see #832). We shouldn’t do it the other way round and rename map:pairs to map:entries, as the existing map:entry function returns a singleton map.
If we keep calling the sequence-concatenated result “content”, we should include it in the definition of sequence-concatenation. In addition, (array|map):values should be renamed to (array|map):contents (see #1179).
Due to the existence of array:value::*, we should make clear what an “array value” is, how it it positions itself in relation to an “array member”, and we should add map:values and array:values for equivalent results.
Due to the existence of array:key::*, we should add a array:keys function (which returns a dense integer range). 1 to array:size($array) could then be written as array:keys($array).
As we have map:entries and map:merge, we could add equivalent array:entries and array:merge functions.
I would suggesting dropping array:members/array:of-members in favor of either array:split/array:join, array:pairs/array:of-pairs (see 1.) or array:entries/array:merge (see 5). I really believe that an “array member“ should not be a map; an “array pair” or ”array entry” certainly could.

One might question if we should really introduce map terminology for arrays. I think we have no other chance if we want to treat maps and arrays identically with lookup key specifiers, and it may help us later on to treat both data structures as similar as possible.

In https://github.com/qt4cg/qtspecs/issues/1457#issuecomment-2360628218, I found cases in which “map keys” and “array indexes” are mentioned. I think it gets more and more confusing to respect the differences between maps and arrays, and analogous to array:key::*, I would be happy if we treated array index values as keys (adding an array:keys function, as suggested in this issue, would support this approach).

I think that if we're going to contemplate adding array:pairs, we need to revisit a more fundamental question: are we going to continue to present arrays as a sequence of values, or are we going to present them as maps with sequential integer keys starting at 1? (I'm not really talking about how they're implemented or even how they're defined technically, I'm talking about how the specification is going to present them.)

If we're expecting the reader to build a mental model of arrays as a sequence of values and maps as a set of key/value pairs, then I worry that adding functions that make them more uniform just makes them harder to understand.

I wonder if there'd be some substantial simplification possible if we just accepted that arrays are maps with sequential, integer keys. All the functions that apply to map {1: "Hello", 2: "World"} apply equally to array{("Hello", "World")} in exactly the way they would if the latter was (literally) implemented as the former.

I wonder if there'd be some substantial simplification possible if we just accepted that arrays are maps with sequential, integer keys. All the functions that apply to map {1: "Hello", 2: "World"} apply equally to array{("Hello", "World")} in exactly the way they would if the latter was (literally) implemented as the former.

I could imagine that this would be a huge change and affect virtually every expression and function that handles arrays. It might as well introduce backward incompatibilities, as arrays are stricter data types than maps (with additional bound and type checks). But your idea sounds enticing; maybe I’m overly cautious.

My intent in the scope of this issue would be mostly to get as consistent as possible, even if we cannot roll back the 3.1 decision to treat maps and arrays differently: If we want to provide a pair lookup specifier for arrays (which is what we currently do), we should also have a corresponding function. If we don’t want the function, we shouldn’t support $array?pair::* either.

I wonder if there'd be some substantial simplification possible if we just accepted that arrays are maps with sequential, integer keys. All the functions that apply to map {1: "Hello", 2: "World"} apply equally to array{("Hello", "World")} in exactly the way they would if the latter was (literally) implemented as the former.

It's an appealing idea but the devil is in the detail. While it's true that all functions that access maps could be made to view arrays as maps with integer keys, the same isn't true for construction, and therefore it isn't true for operations (such as filtering and mapping) that combine retrieval access and construction. There's also the detail that array keys are naturally sorted.

qt4cg / qtspecs

Arrays and maps: Members, entries, values, contents, pairs, … #1338