Open michaelhkay opened 1 month ago
For named types would we use a construct ??~type(FOO)
and for atomics ??~xs:integer
or even ??~integer
given xs:
default?
I'm working backwards from the common case of selection using a record type to the more general case (just as path expressions focus on having convenient syntax for the common cases).
But I think we could achieve something like
KeySpecifier ::= .... | "~" SequenceType
but allowing the SequenceType
to be in parentheses, or perhaps requiring it to be in parentheses if there is an occurrence indicator, which would make it "~" ( ItemType | "(" SequenceType ")")
Let’s assume we have XML encoded either in a document or in a “structured item” (which is how we occasionally call maps/arrays internally). Are the following two expressions comparable to some extent / would they both return the element <a/>
?
let $doc := document { <a/> }
return $doc / element()
let $struct := [ <a/> ]
return $struct ?~ element()
If we wanted to try to make the syntax accessible to non-experts, would it be fair to present /
and ?~
as somewhat equivalent?
It would be great if we could agree on a collective term for "maps and arrays". "Structured item" feels too generic to me. I've toyed with terms like "tabulation", "tabula", "composition", "dataset", "compendium", "aggregate".
Perhaps "combo"? It's best to have a word that stands out from the crowd if we can't find one whose meaning is self-explanatory.
With "/", the RHS is always selecting nodes, and we are primarily selecting nodes by nodekind and name, occasionally by type. So we can write a/element(*, xs:integer)
but we rarely need to, because element names usually provide the handle that we need. With JSON, we don't have element names, so selecting by type becomes a much more common requirement.
The syntax a/element()
works only because element
is reserved as a function name. We don't have the luxury of reserving any names after "?" in the same way. Logically we could think of a/element()
as an abbreviation for a/~element()
, where the ~
can be omitted because element
is a reserved name.
Is there any restriction on using something like element
as an ItemType
name? (I can only see restrictions against using atomic type names). If the are none, then a/~element
would be legal (assuming suitable declaration), but somewhat confusing!
There's no restriction on using bare NCNames as atomic type names or declared item type names. It's quite legal today to do a/element(element, element)
.
My first reaction is to use syntax like:
?? X ?? Y::map
or
?? X ?? Y[isMap(.)]
or
? X ?? maps(Y)
or
?? X ?? Y[hasKeys(.)]
Or why not:
?? X ?? map::Y
I am against introducing new, unreadable symbols in the already quite messed symbol-set we are using at present.
Readability must have much higher priority in our design than introducing new, fancy (cryptic) symbols.
And of course, if the proposal for Total Maps is accepted,
Then any constant non-map value can be represented-as / coerced-to a map:
map {
'\' : ()
} (: produces the empty sequence for any lookup:)
We have dropped the syntax
??type(T)
for filtering the results of lookup expressions, because of problems with syntax ambiguity. This issue seeks an alternative.Although selection by type also makes sense with shallow lookup, it is most relevant with deep lookup. The main need arises with intermediate steps of a path such as
?? X ?? Y
which gives a dynamic error if X selects something that is not a map or array. This is consistent at one level with// X // Y
, except that// X
can never select something that isn't a node.The main problems with filtering using an
[. instance of record(p, q)]
predicate is that it's very long-winded. For example, if we want to select only those members of a selected array that are sequences of a particular record type, without flattening everything else, we have to write something like?? values::* ?[. instance of record(p, q)+] ? *
, which is a bit of a nightmare.Starting from the end goal, I would like to be able to write something close to
??record(first, last)
to select all the items of this record type at any depth. We know that syntax doesn't work, because??NCName
is already taken. That's also true for??items::record(first, last)
, unless we change the rules for what can appear after::
.Also, there's another syntax hazard: what we want here is a SequenceType, not an ItemType, and that means that it can contain a trailing
?
occurrence indicator, which is easily confused with the next lookup operator in a path.Looking at it from all angles, I do feel the best solution is to prefix the
record(first, last)
with a marker character so that we know we've got a type filter here. Characters that might do the job include@
,#
,$
,%
,^
,~
. Of these, my preference remains~
, for three reasons:(a) it's currently unused: overloading a different symbol is more likely to cause visual confusion (b) one of the traditional uses of
~
is to indicate a "matches" or "is kind of like" relationship. (c) there's a mnemonic association between "tilde" and "type" (compare "at" and "attribute")