Arrays: Representation of single members of an array

ChristianGruen commented 8 months ago

When introducing the new array features to some users, the for member syntax was welcomed by everyone.

However, there was some confusion (again, see my past feedback to the mailing list) about what the QT4 group considers to be “members of an array”, and about value records.

In particular, the “value record” representation of arrays led to questions that I didn’t have a good answer for. In particular, people didn’t understand why an array member was returned as a map, and why that map is (again) called “array member” or “value record” – a term no one associated with arrays (at least for now… which somewhat is not surprising, as it has just been introduced).

Next, due to atomization (as mentioned before), array:split allows us to omit the explicit ?value lookups that are required for array:members:

sum(array:members($array)?value)
sum(array:split($array))

I suppose I have been biased in my presentation, but I’ve failed to give good arguments to justify the current solution in the spec. The questions that I think need to be answered are:

How will people benefit from the (usually intermediate) map representation for array members?
What exactly do we win with array:members and array:of-members instead of using the existing array:join function, combined with the new array:split function?

Out of interest, I have rewritten the formal equivalencies for the array functions with array:split/array:join:

array:append

array:of-members((array:members($array), map{'value':$member})) array:join((array:split($array), array { $member }))

array:build

array:of-members($input ! map { 'value': $action(.) }) array:join($input ! array { $action(.) })

array:filter

array:of-members(array:members($array) => filter(function($m) { $predicate($m?value) }) array:join(array:split($array) => filter(function($m) { $predicate($m?*) })

array:for-each

array:of-members(array:members($array) ! map { 'value': $action(?value) }) array:join(array:split($array) ! array { $action(?*) })

array:for-each-pair

array:of-members(
  for-each-pair(array:members($array1), 
    array:members($array2), 
    function($m, $n) {map{'value': $action($m?value, $n?value)}}))
array:join(
  for-each-pair(array:split($array1), array:split($array2),
    function($m, $n) { array { $action($m?*, $n?*) } }))

array:insert-before

array:of-members(array:members($array) => insert-before($position, map{'value':$member})) array:join(array:split($array) => insert-before($position, array { $member }))

array:remove

array:of-members(array:members($array) => remove($positions)) array:join(array:split($array) => remove($positions))

array:reverse

array:of-members(array:members($array) => reverse()) array:join(array:split($array) => reverse())

array:slice

array:of-members(array:members($array) => slice($start, $end, $step)) array:join(array:split($array) => slice($start, $end, $step))

array:split

array:of-members(array:members($array) => sort($collation, function($x) { $key($x?value) })) array:join(array:split($array) => sort($collation, function($x) { $key($x?*) }))

array:subarray

array:of-members(array:members($array) => subsequence($start, $length)) array:join(array:split($array) => subsequence($start, $length))

array { $sequence }

array:of-members($sequence ! map { 'value': . }) array:join($sequence ! array { . })

[E1, E2, E3, ..., En]

array:join((map { 'value': E1 }, map { 'value': E2 }, map { 'value': E3 }, ... map { 'value': En })) array:join((array { E1 }, array { E2 }, array { E3 }, ... array { En }))

$array?*

array:members($array) ! ?value array:split($array) ! ?*

$array?$N / $array($N)

array:members($array)[$N]?value array:split($array)[$N]?* (or array:get($array, $N))

As a side note, I noticed that the equivalence given for array:join must be buggy:

(: current equivalence presented in the spec :)
array:of-members($arrays ! array:members(.))

(: returns [ 1, 2, 3 ] :)
let $arrays := ([ 1 ], [ 2, 3 ])
return array:of-members($arrays ! array:members(.))

Concluding, If I could choose, I would tend to drop array:members and array:of-members and rename array:split to array:members.

michaelhkay commented 8 months ago

The difference between array:split and array:members is essentially a choice on how to represent an array member: in one case we do it with a "value record" and in the other we do it with a singleton array.

In recent work I have experimented with both, and I have to say I'm not happy with either. Neither really works well when you attempt a transformation based on a recursive tree walk using pattern matching.

I'd like to consider going back to my original idea of splitting an array into "parcels" (or building an array from parcels), where a parcel is a zero-arity function carrying the annotation %parcel; calling the function delivers the contents of the array member. This is about as close as we can get to an encapsulated representation of the concept without actually extending the data model.

I've just re-read your email summarising feedback from BaseX users. It's a very useful contribution, but I think it's very much an XQuery users' perspective. It doesn't feel to me that these users are struggling with the challenge of doing complex structural transformations of JSON documents.

michaelhkay commented 8 months ago

But I do agree that at the XQuery and XPath level, "for member $x in $array" and "for key $k value $v in $map" are nicer; and I'm inclined to (revert to) proposing something similar for XSLT:

<xsl:for item="$x" in="$sequence">...</xsl:for>
<xsl:for member="$m" in="$array">...</xsl:for>
<xsl:for key="$k" value="$v" in="$map">...</xsl:for>

In each case allowing the "loop body" part of the expression to be either a sequence constructor or a select attribute.

For join operations there's definitely a benefit in being able to bind range variables rather than the context item.

ChristianGruen commented 8 months ago

We use something like parcels for our current Java bindings: Java objects, in particular those that have no obvious XDM type, are wrapped into function items, and can explicitly be converted to XDM types by invoking them. It’s pretty convenient.

%member feels like an appropriate name (but I guess your vision is more generic and not necessarily limited to arrays).

I completely agree that this discussion is driven by XQuery, and I haven't considered generic map/array updates at all. In our world, complex updates on JSON are usually done with XQUF (sometimes verbose, and custom to our JSON XML representation, but definitely powerful and versatile):

'{ "one": 1, "due": 2, "three": 3 }'
! json:parse(.)
! (json update {
  delete node ./three,
  rename node ./one as 'uno'
})
! json:serialize(.)

ChristianGruen commented 7 months ago

As a side note, I noticed that the equivalence given for array:join must be buggy:

My side note can be ignored; the equivalent expression looks alright.

ChristianGruen commented 7 months ago

I'd like to consider going back to my original idea of splitting an array into "parcels" (or building an array from parcels), where a parcel is a zero-arity function carrying the annotation %parcel; calling the function delivers the contents of the array member. This is about as close as we can get to an encapsulated representation of the concept without actually extending the data model.

I believe we absolutely need to find other names for array:members and array:of-members:

In the XQFO spec, $member is defined multiple times as parameter, and its type is always item()*.
The rule of array:get (which also returns item()*) states that it “returns the member at a specified position in the array”.

If the return type will be %parcel functions, possible names could be array:parcels and array:of-parcels.

In any case, we may need to find and document more uses for these two functions, and cases where array:join wouldn’t work, or at least be more verbose. It’s a commonplace, but with any new concept, there’s some risk that we’ll overwhelm users.

michaelhkay commented 7 months ago

We need a mechanism to split an array into its parts (members) and to reassemble those parts in a different way. The question is, what is the best way of representing the parts? array:split and array:join represent the parts as an array of arrays, and that is certainly one way of doing it; array:members and array:of members represent the parts as "value records" and that is another way of doing it.

When we're doing a rule-based tree-walking transformation in the XSLT style, we want to write rules that process the parts of the array and transform them. That means we need to match them, which means we need to distinguish them from other kinds of value. The challenge is therefore to find a representation that makes these "parts of an array" easily recognisable as such. Splitting into "value records" serves that purpose rather better than splitting into sub-arrays, though it is by no means perfect.

When we work with XML, intermediate data values can be made very easily recognizable by choosing distinctive element names. Working with maps and arrays is much more difficult because there are no element names to match. Perhaps annotations can fill the gap.

michaelhkay commented 7 months ago

There was discussion today about deep lookup and deep update, and both of these would benefit from being able to talk about the "leaf values" in a map or array as something that's more than just a sequence of items. Rather in the same way that a text node is more than just a string.

Related: when we talk about key-value pairs in a map, I often find it awkward that the word "value" is used both to mean "any XDM value; a sequence", and to mean one part of a map entry. Things would get much easier if we could improve the terminology:

There are two kinds of functions: tabulated functions and procedural functions.
There are two kinds of tabulated functions: maps and arrays.
A tabulated function consists of a set of entries, called key-member pairs. The key is an atomic item, which in the case of an array is always an integer; the member is an arbitrary value.
The term "atomic item" (or just atom?) replaces "atomic value".

It would be nice to think of a deep-lookup returning a set of members, in the same way as a path expression selects a set of nodes, which is then implicitly flattened/atomized if the context requires a flat sequence. This still leaves all the options open for how "members" are represented.

ChristianGruen commented 7 months ago

Related: when we talk about key-value pairs in a map, I often find it awkward that the word "value" is used both […]

Maybe values of map entries could be called members, and…

This still leaves all the options open for how "members" are represented.

…instead of array:members, we could have array:entries, which returns singleton maps with the array index as key and the member as value, and possibly array:merge (instead of array:of-members) to create an array from those entries. And we could have another thought on map:pair, map:pairs and map:of-pairs: I feel they're pretty redundant and could be removed.

ChristianGruen commented 7 months ago

The proposed functions could also be used to convert arrays to maps, and vice versa:

$array
=> array:entries() => map:merge()
=> map:entries() => array:merge()

array:merge could be defined to allow for the creation of sparsely populated arrays:

(: Result: [ (), (), 'III', (), 'V' ] :)
array:merge((map { 3: 'III' }, map { 5: 'V' }))

qt4cg / qtspecs

Arrays: Representation of single members of an array #826