Maps & Arrays: Consistency & Terminology

qt4cg / qtspecs

QT4 specifications

https://qt4cg.org/

Other

28 stars 15 forks source link

Maps & Arrays: Consistency & Terminology #1169

Open ChristianGruen opened 4 months ago

ChristianGruen commented 4 months ago

After the introduction of #1094 and #1159, and before adding more map/array operations, I think it’s time to get more serious about consistency and terminology. The current drafts employ a variety of terms that are not clearly defined, or separated from each other. We now have at least…

items, members, pairs, keys, values, entries

…which are sometimes used for maps, for arrays, or for both data structures. A first attempt to clean up, with reducing the overall effort:

A minor one: The modifier for lookups should be in singular form, analagous to node axes: item, key, value, pair.
While I first advocated the orthogonality principle for axes in lookup expressions, I now think we should stick to the existing terminology. Otherwise, we would need to revise many other existing parts of the spec. My suggestion would be to:

introduce member for arrays
only allow key, value and pair for maps
allow items for both maps and arrays

This would make it symmetric with a) the current terminology for maps and arrays, and b) enhanced for clauses, i.e. for member $m and for key $k value $v.

The reverse approach would be to drop for member $m and to also allow for key $k value $v for arrays (with for value replacing for member). In addition, we could have for pair.

With the introduction of the item axis, map:values and arrays:values should be renamed to map:items and array:items. → map:contents and array:contents, see #1179
I would suggest dropping array:members and array:of-members. The names don’t imply we’ll deal with records, and it’s not in line with for member $m either. If we want to keep these functions, we could rename them to array:pairs and array:of-pairs and add the integer positions as keys, and we should introduce and consistently use the term pair for maps and arrays.

Closely related: #826

michaelhkay commented 4 months ago

A minor one: The modifier for lookups should be in singular form, analagous to node axes: item, key, value, pair.

I'm fine with that.

introduce member for arrays only allow key, value and pair for maps allow items for both maps and arrays This would make it symmetric with a) the current terminology for maps and arrays, and b) enhanced for clauses, i.e. for member $m and for key $k value $v.

The minimal set would probably be

item (or content) for both maps/arrays
pair for maps
members for arrays

With the introduction of the item axis, map:values and arrays:values should be renamed to map:items and array:items.

Can't say I like the effect much. But I'm not happy with our over-use of value either. Perhaps content is better.

I would suggest dropping array:members and array:of-members. The names don’t imply we’ll deal with records, and it’s not in line with for member $m either. If we want to keep these functions, we could rename them to array:pairs and array:of-pairs and add the integer positions as keys, and we should introduce and consistently use the term pair for maps and arrays.

Yes that seems feasible.

Now the other thing in my mind is to try and unify this with labels. If we deliver the results of a lookup as maps containing key + value, why shouldn't the result also contain accessor functions equivalent to the fields in a label: specifically parent() and ancestors()? The two features definitely have significant overlap.

ChristianGruen commented 4 months ago

Now the other thing in my mind is to try and unify this with labels. If we deliver the results of a lookup as maps containing key + value, why shouldn't the result also contain accessor functions equivalent to the fields in a label: specifically parent() and ancestors()? The two features definitely have significant overlap.

Sounds reasonable (I haven’t spent time on/with pins and labels yet, I should do so soon). Would $map?pin::* or $map?label::* make sense? Regarding the naming, I felt similar to @cmsmcq that I would tend to associate “labels” with plain strings. Alternative terms for pins and labels that show their correlation could possibly be beneficial, now that we confront our poor (or brave) users with so many new concepts.

michaelhkay commented 4 months ago

I would suggest dropping array:members and array:of-members.

The main benefit of these functions is probably as primitives that can be used to define the semantics of all the other functions concisely.

For example we currently define array:join rather concisely as

array:of-members($arrays ! array:members(.))

Perhaps private functions could serve the same purpose. But if these functions are so useful as primitives, one feels that they would be useful tools for end-users as well.

A reminder of how we got here: I started this journey by defining a parcel as an item that encapsulates a sequence, with various possible representations, including perhaps as a zero-arity function, or perhaps with the option of making the internal representation entirely opaque. But that creates questions as to how "parcels" fit into the type system, so we ended up with a concrete representation (as record(value)) instead. In many ways I would be happier with the opaque concept.

ChristianGruen commented 4 months ago

The main benefit of these functions is probably as primitives that can be used to define the semantics of all the other functions concisely.

I still find the array:join/array:split variants better digestible, as exposed in #826 …but that’s just a matter of taste I guess. In particular, it’s the use of map constructors in the scope of array operations that seems confusing and unnecessarily verbose to me:

(: array { } :)
array:of-members($sequence ! map { 'value': . })
(: map:build :)
array:of-members($input ! map { 'value': $action(.) })
(: map:append :)
array:of-members((array:members($array), map { 'value': $member }))

(: vs :)
array:join($sequence ! array { . })
array:join($input ! array { $action(.) })
array:join((array:split($array), array { $member }))

Using a record constructor would possibly make the existing equivalencies, though.

A reminder of how we got here: I started this journey by defining a parcel as an item that encapsulates a sequence, with various possible representations, including perhaps as a zero-arity function, or perhaps with the option of making the internal representation entirely opaque. But that creates questions as to how "parcels" fit into the type system, so we ended up with a concrete representation (as record(value)) instead. In many ways I would be happier with the opaque concept.

True; I remember to have mentioned (one variant of) our Java binding that wraps object into function items and triggers the implicit conversion to XDM items by invoking the function item. The same could have been done with array members.

However, maybe there are not really use cases left for which an additional parcel/function/record representation is still required:

Thanks to atomization, arrays can already be supplied to functions that expect atomic items.
With for member $m, we can iterate over array members.
With pin/label, array members can be decorated.
Thanks to array axes, we get results in a flat or structured way.

I think we can already be happy if the newly added concepts are utilized by a considerable number of people. Maybe we should be careful not to overdo it, and maybe we should continue to appreciate the laudable 3.1 concepts of arrays (such as the powerful implicit atomization).

ChristianGruen commented 4 months ago

If we drop array:members, we could rename array:split to array:members:

map:merge(map:entries($map))
array:join(array:members($array))

This would feel intuitive to me, as the current spec terminology (as far as I can judge) regards array members as the counterparts of map entries.

michaelhkay commented 4 months ago

Indeed, the two function pairs array:split/join and array:members/of-members differ only in the way that they represent a "parcel", that is, the way they package a sequence as a single item. There's not a big difference, and although the latter pair seems a little more "type-safe" to me (converting an array of sequences to a sequence of arrays can be a bit confusing...), it's probably true that the first pair have a wider range of application, meaning that if either pair goes, it should be the second.

michaelhkay commented 4 months ago

Actually, I'm a bit confused about the spec of array:split.

What is the result of array:split([ (1,2), (3,4) ])?

The notes say

The function call array:split($array) produces the same result as the expression for member $m in $array return [ $m ].

Which makes the result ( [(1, 2)], [(3, 4)] ).

But this is only in the notes, and the actual rules are very informal; and none of the examples makes this clear. The only example that touches on it is the fourth example, and that one would work equally well if the result were ( [1,2], [3,4] ) -- which is probably the result many users would expect.

More examples are needed, and the first note should be moved into the normative rules.

ChristianGruen commented 4 months ago

What is the result of array:split([ (1,2), (3,4) ])? ... Which makes the result ( [(1, 2)], [(3, 4)] ).

Yes, that's supposed to be the result. Otherwise, the result could not be reversed (which is what I wanted to point out with “This function is the inverse of array:join.”):

(: [ 1, 2, 3, 4 ] :) 
array:join([ 1, 2 ],[ 3, 4 ])

But this is only in the notes, and the actual rules are very informal; and none of the examples makes this clear.

True; sorry for that, and thanks for the hints. I still fail to understand what information is required to make the rules comprehensive enough (there are various functions, like array:fold-left, which don't have informal rules at all, possibly due to historical reasons?). I’ll be glad to revise the presentation once we decide which functions we want to keep.