Open line-o opened 1 year ago
Maybe a simple function is sufficient, and we could add the occurrence indicator. In BaseX, we have inspect:type.
I do like the functions the inspection
module provides, @ChristianGruen. So yes, a function fn:type($var)
or fn:inspect-type($var)
would suffice.
And it would make sense to also include the options map. That way we could limit the depth of the traversal.
I would even propose to standardise fn:inspect-function($function as function(*)) as element()
. I did not see a proposal for this yet.
The inspect:type
function with the options "mode" and "item" as implemented in baseX are very well thought out.
@ChristianGruen what do you think about the "depth"-option in addition to "mode": "computed"?
This has been rejected in the past.
Firstly, there is no such thing as a "most specific type". The type system is a lattice.
See also https://github.com/w3c/qtspecs/issues/14.
For example, what would you return for an empty map? It's an instance of both map(xs:anyURI, element(banana))
and map(xs:duration, array(s:integer))
- is either of these more specific than the other?
Secondly, it allows you to discover details of the implementation that should be hidden. For example, abs()
applied to an xs:integer
might return an xs:non-negative-integer
or perhaps even a saxon:64-bit-unsigned-integer
. Users shouldn't be encouraged to go discovering such internal details.
Secondly, it allows you to discover details of the implementation that should be hidden. For example,
abs()
applied to anxs:integer
might return anxs:non-negative-integer
or perhaps even asaxon:64-bit-unsigned-integer
. Users shouldn't be encouraged to go discovering such internal details.
Then why many languages provide such functions to the users:
For example, what would you return for an empty map? It's an instance of both map(xs:anyURI, element(banana)) and map(xs:duration, array(s:integer)) - is either of these more specific than the other?
If a generic object has no value (is empty) then it can have all possible generic types, which means its type is their intersection (most common super-type), and this will result in the type:
*`map()`**
Returning this is still much more useful than not providing the functionality at all.
Related to this: We do need a type object in XPath. This will give us a proper answer to the question:
What is the right-hand-side (2nd argument) of the operator instance of
?
@line-o Could you share an example for the proposed depth
option?
Firstly, object oriented languages define classes as objects.
Secondly, they have a type system where every object belongs uniquely to one particular class. It's not always predictable, because methods can return a subtype of the declared type, but that's manageable because you can explore the type hierarchy programmatically to get the information you want. Without that ability, discovering that a function like abs()
has returned an instance of saxon:fairly-small-positive-integer
is useless. Especially if it returns something different the next day.
The XPath type system isn't like that. Parts of it are, for example atomic values are "labelled" with a specific type. But sequences aren't, and maps aren't.
To be honest, the type system in XDM is rather a mess, and it's very easy for proposals like this to make things worse rather than better.
But I'd suggest if you want this then you start by producing a complete, fully articulated proposal that we can critique. Providing use cases showing how the feature can be used to solve real problems would also enable validation that it will actually be usable.
(Saxon, incidentally, does have functions like this: see https://www.saxonica.com/documentation11/index.html#!functions/saxon/type and https://www.saxonica.com/documentation11/index.html#!functions/saxon/type-annotation . The specification of these extensions falls well short of what would be needed in a standard, yet it's already pretty complex, and it would be hard to make it interoperable. The library exposes type information in the form of functions. These days I would be quite inclined to expose it in the form of maps; except that the maps would contain cyclic references to other maps, and cyclic maps are not something that XDM has ever envisaged (JSON serialization, for example, would be non-terminating).
Secondly, it allows you to discover details of the implementation that should be hidden.
@michaelhkay I can imagine that users may also be tempted to stumble upon implementation details when playing around with instance of
. If we wanted to avoid that, we’d probably need to revise functions like fn:abs
and enforce unique return types. Instead of saxon:64-bit-unsigned-integer
, we should certainly limit the output to types that are defined in the specifications.
But it will certainly take some time to define reasonable string representations for function items. For empty maps, I agree with @dnovatchev that map(*)
is the output most users would expect.
Let me just add that I added
stop traversal as early as possible, when the first mismatching type is encountered
so as to return with generic types like map(*)
, array(*)
and function(*)
as quickly as possible.
But I see the point that it needs to be said explicitly. My naive implementation in XQuery does exactly that.
@ChristianGruen for the depth-option: this controls the for how deep to traverse structures like sequences, arrays, maps and probably also records and sets when they are added.
let $struct := [ [ [ map {}, map{} ] ] ]
return (
fn:type-of($struct, map{"depth": 1}),
fn:type-of($struct, map{"depth": 2}),
fn:type-of($struct)
)
"array(*)", "array(array())", "array(array(array(map(*))))"
I don't understand the "traversal" process. It seems to be assuming navigation of some kind of type hierarchy. Even for nodes, there is no such hierarchy: element(a:b, xs:int)
has multiple supertypes including element(*, xs:int)
and element(a:b, xs:long)
. We need a spec, please.
A viable spec would be something like:
For atomic values, return the local name of the most specific built-in (XSD) type of which the value is an instance.
For nodes, return node-kind(name)
where node-kind is for example "element" or "attribute" and name (for nodes that have a name) is the node name in Q{uri}local notation.
For maps return map(*)
, for arrays return array(*)
For other functions return a string representation of the function signature that (a) conforms to XPath SequenceType syntax, and (b) uses EQName notation for qualified names.
I edited the description to reflect the input from @michaelhkay and @ChristianGruen
For arrays and maps with members and entries the implementation can attempt to find the most specific shared type
Does “can” imply that the behavior of the function is supposed to be implementation-defined?
The inspection should stop as early as possible, when the first mismatching type is encountered.
Could you elaborate what “mismatching type” means in this context?
The phrase "most specific shared type" has no meaning. It suggests that given two types T and U, neither of which is a subtype of the other, and given the set of types S consisting of all those types of which both T and U are subtypes, then there is one type in S that is a subtype of every other type in S. This is simply not true. For example, given T=schema-element(E), and U=schema-element(F), where E and F are both in the substitution group G and also in the substitution group H, then S includes both schema-element(G) and schema-element(H), neither of which is necessarily a subtype of the other.
The word "inspection" seems to be referring to a process or algorithm that can "stop". This algorithm is nowhere described.
I will try to describe the algorithm that I implemented in xbow:get-type#1
(see link in description).
I would suggest breaking this up into a number of useful simpler functions.
fn:node-kind($node as node()) returns one of the strings "document", "element", "attribute", "text", "comment", "processing-instruction", or "namespace".
fn:type-annotation($item as item()) returns information about the type annotation of an atomic value or node (see XDM for definition) in the form of a map. (If $item is not an atomic value or a node it returns an empty sequence.) The map includes entries:
-- name - a QName, the name of the type, present if the type is not anonymous
-- base-type() - a function returning the same information for the base type
-- variety - "simple" or "complex"
etc.
fn:distinct-types($input as item()*) - return the result of applying fn:type-annotation(.) to each item in $input, with redundant entries removed. An entry is redundant if there is another entry for the same type or for a supertype.
Just FYI: Here is how FXSL's function f:type , implemented almost 20 years ago, gives us what is actually the most derived type of an instance.
This function was very useful and instrumental at least withing FXSL.
Yes, getting the most specific built-in atomic type for a supplied atomic value is straightforward. The only complication is that the result might not be interoperable, for example 2+2 is required to return an xs:integer
, but it is permitted to return an xs:int
or xs:positiveInteger
. This creates the danger that people will write type(2+2) = QName('xs:integer')
and wrongly assume this is always the same as saying (2+2) instance of xs:integer
.
[Saxon rarely takes advantage of this freedom. But there are occasions when it does so, for example abs($x)
returns $x unchanged if it is non-negative.]
@michaelhkay
2+2 is required to return an xs:integer, but it is permitted to return an xs:int or xs:positiveInteger
This statement seems to be logically inconsistent. If this is part of any spec document, this is clearly a bug in that spec.
If some specific result is r e q u i r e d to be returned, then how can something else be returned and not cause an error?
If it returns an xs:int
then it satisfies the requirement to return anxs:integer
, because every xs:int
is an xs:integer
.
If it returns an
xs:int
then it satisfies the requirement to return anxs:integer
, because everyxs:int
is anxs:integer
.
Then the stated requirement must be: "xs:integer or any sub-type of it", or in other words: "any node of the subtree of types, rooted in xs:integer"
The actual language of the spec (F+O §1.4) is "The return-type, also in italics, specifies the static type of the value returned by the function. The dynamic type of the value returned by the function is the same as its static type or derived from the static type."
I'm afraid the language of this sentence falls into the common trap of suggesting that we can speak of "the dynamic type" of a value, which is inaccurate. We also shouldn't talk of one type being derived from another, we should use the "is subtype of" relation defined in XPath §3.7.
If the required type is xs:integer+
, then a result of (xs:int(2), xs:long(3), xs:positiveInteger(5))
matches this required type, but there is nothing that tells us what "the dynamic type" of this sequence is. I've been working on correcting some of these references. We should never say "the [dynamic] type of the value V is T", we should always say "V is an instance of [or 'matches'] T".
I've been thinking about this and wondering if (at least for items rather than values) we could find a way of defining that every item belongs to exactly one "essential" type plus any number of "accidental" types [1].
For atomic values it's easy enough: the essential type is the one identified by the type annotation, the accidental types are its supertypes in the atomic type hierarchy and all union types that include any of these as a member type.
For nodes, perhaps the essential type is the combination of node kind, name, and type annotation, for example element(N, T)
. But perhaps the nilled
property comes into it as well. Certainly membership of types schema-element(E)
or schema-attribute(A)
could be considered accidental. Note: it's not the case that the accidental types are always supertypes of the essential type.
For function items, the essential type is defined by the function signature. Function annotations presumably are accidental.
For arrays and maps, I think the essential type is simply array(*)
or map(*)
. Anything more specific, such as array(union(xs:int, xs:string))
is accidental.
This approach might lead us to a position where we can indeed say that every item has an unambiguously-defined "essential type" which we could regard as a property of the item. The question then becomes, if we want to return the "essential type" as the result of a function, how to we represent types as XDM values?
We do in fact in Saxon have a map-based XDM representation which can faithfully represent every SequenceType. It wasn't designed for public consumption, but we could use that as a starting point. However, it's a rather shallow representation in that it (like the SequenceType syntax) references schema types by name, and it's not clear how useful that is unless you can interrogate the properties of schema types - which would mean having an XDM representation of the entire Schema Component Model.
[1] https://plato.stanford.edu/entries/essential-accidental/
Perhaps looking at the C# Type class could be helpful.
Not saying we must copy verbatim, but there are many useful things that might give us good ideas, like:
Properties:
There was a recent discussion on the Xml.com chat and @line-o mentioned this issue.
The fact is that users continue to express their need for a type object (as this slack discussion showed) and we need to do something to respond to those users' needs.
Things have also changed since the date when this issue was initially submitted. For example, we have now the record type, and record-type tests in the language. And a record can be self-referential, thus it seems at least some of the difficulties cited above by @michaelhkay have been overcome.
As a user I need to be able to refer to a type of a value, like type-of(someXdmValue)
.
We need finally to be able to have: $something instance of $someType
and we also need to be able to define a variable as:
$myVar as type-of($someObject)
Isn't it high time to stop picking only the low-hanging fruit for our users, while leaving the real treasures perhaps as "a bunch of sour grapes that are not worth gaping for" ?
https://read.gov/aesop/media/the-fox-and-the-grapes_Resources/the-fox-and-the-grapes.mp4
Fundamental changes to the type system are going to be extremely hard to achieve. Saying that users need something is easy; defining something that meets that need (while retaining compatibility) is much harder.
Fundamental changes to the type system are going to be extremely hard to achieve. Saying that users need something is easy; defining something that meets that need (while retaining compatibility) is much harder.
Maybe a series of small steps would be more realistic?
This is just a brief sketch of the process of trying to solve the problem.
Any unsurmountable difficulties?
Well, there are several difficulties there.
Given <xsl:variable name="x" as="xs:anyAtomicType" select="42"/>
I would expect type-of($X)
to be the same as type-of(42)
(by basic substitutability principles).
It's a fallacy to speak of "the common base type". The type system is not a hierarchy (in fact, it's not even a lattice). We spent a lot of time eliminating this bug, which appeared in a number of places in the 2.0 specs.
Well, there are several difficulties there.
Given
<xsl:variable name="x" as="xs:anyAtomicType" select="42"/>
I would expecttype-of($X)
to be the same astype-of(42)
(by basic substitutability principles).It's a fallacy to speak of "the common base type". The type system is not a hierarchy (in fact, it's not even a lattice). We spent a lot of time eliminating this bug, which appeared in a number of places in the 2.0 specs.
We probably can agree and establish a main type in such cases, for example that the main type of 42 is xs:integer
.
This gives us at least a specific type to work with, and is better than not having any type at all,
Well, there are several difficulties there.
Given
<xsl:variable name="x" as="xs:anyAtomicType" select="42"/>
I would expecttype-of($X)
to be the same astype-of(42)
(by basic substitutability principles).It's a fallacy to speak of "the common base type". The type system is not a hierarchy (in fact, it's not even a lattice). We spent a lot of time eliminating this bug, which appeared in a number of places in the 2.0 specs.
This was not a problem in C#. Given this code snippet:
var nnnn = 4;
var t = nnnn.GetType();
How does the C# compiler decide that the type of nnnn
is int
and not one of the other 7 types that 4
is instance of?
Maybe we could learn from the solution, and try to do the same here?
In particular they are using this specific rule:
The type of an integer literal is determined by its suffix as follows:
int
, uint
, long
, ulong
.The type system of C#, like other object-oriented languages, is completely different from that of XDM.
@dnovatchev Given a C# statement like:
var x = 42
the C# language assigns that variable a type of int
. The equivalent in XPath is:
let $x := 42
where 42 has the associated type of xs:integer
due to the https://www.w3.org/TR/xpath-31/#id-literals section.
Where C# and XPath differ is that in C# the variable has a specific type, such that:
x is Integer // true
x is Short // false
However, in XPath, the instance of rules use subtype matching:
$x instance of xs:integer (: true :)
$x instance of xs:short (: true -- $x is in the range of the short value space :)
This means that if you have $y with a value of 32768 then:
$y instance of xs:integer (: true :)
$y instance of xs:short (: false -- $y is larger than the max short value :)
This is similar to the duck typing approach in Python, where if a type has the correct methods, etc. then passing it to a function will work.
XPath has well-defined instance of (subtype) and cast (casting) rules for working with variables and values, but those operate on the value space of the instance not the underlying representation of that value. That is what makes these rules complex.
When a variable has a declared type, you need to think about the value space that type represents, not the type itself. Given an xs:short value, you don't know if the value is an xs:byte (i.e. if it is in the range -127 to 128), but you do know that it is an xs:long (as the value space of xs:short is contained in the value space of xs:long).
The type system of C#, like other object-oriented languages, is completely different from that of XDM.
Seems that what I wanted to say was not understood. To quote the C# rules for determining the type of an integer literal:
int
, uint
, long
, ulong
.So, there is a convention about which type to assign to the variable, out of many possible types. Thus, although 42
is small enough to be of type int16
or even byte
, these are not even considered in making the type inference.
When we say that an XPath literal value can be assigned many types, can't we follow the same solution that the designers of C# have employed? Thus we will order the desired types in a predefined sequence, and choose the first type of this sequence, of which the literal value is an instance of.
See also the C# language Specification
https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-literals does what C# does:
The value of a numeric literal containing no "." and no e or E character is an atomic value of type xs:integer.
Therefore, for let $x := 42
the variable x
has the type xs:integer
by those rules like var x = 42
has the type int
from C#'s literal value rules. Similar for other numeric and string constants.
Where C# and XPath differ is in the subtyping and casting rules. It's the subtyping rules which create the type lattice/graph. XPath works similar to C/C++'s integer promotion and automatic type conversion.
The following sections are useful for understanding types in XPath:
xs:integer
is both an xs:anyType
and an item()
, and xs:IDS
is both an xs:IDS
and xs:ID*
derives-from
function for atomic type matchinginstance of
, etc.Any proposal needs to work with the above rules and specification language.
https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-literals does what C# does:
The value of a numeric literal containing no "." and no e or E character is an atomic value of type xs:integer.
Therefore, for
let $x := 42
the variablex
has the typexs:integer
by those rules likevar x = 42
has the typeint
from C#'s literal value rules. Similar for other numeric and string constants.Where C# and XPath differ is in the subtyping and casting rules. It's the subtyping rules which create the type lattice/graph. XPath works similar to C/C++'s integer promotion and automatic type conversion.
@rhdunn I am confused. Are you saying that there is no problem in XPath to give a specific type to any value? And that the difficulties are only in the subtyping rules?
I was answering your comment around the link and C# spec text you referenced, saying that XPath already has the language in the specification you are asking for.
There is static type of a variable (as specified in the static context), and a dynamic type (as specified in the dynamic context).
The dynamic type definition states:
Every value matches one or more sequence types. A value is said to have a dynamic type T if it matches (or is an instance of) the sequence type T.
Because of that, a variable can have multiple dynamic types. Because a type-of($x)
would work on the dynamic type of the variable/value, that would not return a single type. -- This is the issue that Mike is trying to explain.
We already determine the type of a literal value from its lexical form, for example 1 is an xs:integer, 1.0 is an xs:decimal, 1e6 is an xs:double. That's not the issue here. The issue is that (unlike C# or other OO languages) there are many values, of which the most obvious example is an empty map, that do not have a unique "most specific" type.
We already determine the type of a literal value from its lexical form, for example 1 is an xs:integer, 1.0 is an xs:decimal, 1e6 is an xs:double. That's not the issue here. The issue is that (unlike C# or other OO languages) there are many values, of which the most obvious example is an empty map, that do not have a unique "most specific" type.
We already have discussed the nothing() type that is suitable for things like empty collections.
And for any other value that has N possible types, the solution that works for C# will also be applicable here: Put all such types in an ordered structure and select the first type from that ordered structure (when traversed following its order) that happens to be a matching type for the value.
For every type we have a corresponding constructor function, and using it we also solve another problem: create a new value that has the same type as a given, specific value.
One could argue that the ordering of the types would be rather arbitrary and in any case not well justified, but choosing a seemingly good ordering would be a step in the right direction.
Put all such types in an ordered structure
We're going round in circles. The type system is a directed acyclic graph. There is no ordering.
Put all such types in an ordered structure
We're going round in circles. The type system is a directed acyclic graph. There is no ordering.
I meant: impose an ordering of our own choosing.
And here is what Wikipedia says about this:
"_A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time_."
A DAG has many topological orderings, and the one you will get is not in general "one of our own choosing", in general it is arbitrary. A type-of() function that returns one of the types an item belongs to, selected arbitrarily, is not going to be useful to anyone.
As for performance, the time to compute an ordering of the entire type system is irrelevant when there's an infinite number of types involved. That's not actually what's needed here; what's needed is an algorithm that selects a type T that an item belongs to, such that no other type U that the item belongs to is a subtype of T.
In practice, the case that's most difficult arises with functions. Consider an array of functions with the same arity but different signatures. What should be the type of such an array? The easy pragmatic solution is to return something like array(function(*))
. But do you also do that if the functions all have the same signature?
A DAG has many topological orderings, and the one you will get is not in general "one of our own choosing", in general it is arbitrary. A type-of() function that returns one of the types an item belongs to, selected arbitrarily, is not going to be useful to anyone.
Yes, and maybe we could examine these, possibly many, orderings and come up with one "preferred ordering"?
If no one has attempted to do this until now, it is not justified just to claim in advance that "trying to do this is meaningless and doomed to failure from the very start".
I think there are two different (though related) goals in this discussion:
typeof
in JavaScript, getClass().getName()
in Java).Objective 1 seems easy enough, good answers have already been given in the initial thread and in https://github.com/qt4cg/qtspecs/issues/148#issuecomment-1439271691. I think that most people would already be happy to get xs:integer
, element()
, map(*)
and function(*)
, etc. or the corresponding super types if we support sequences (xs:anyAtomicType
, node()
, item()
, etc.).
Objective 2 is much more difficult, and the generated result may not necessarily be helpful to everyone, as it would require everyone to fully understand the implications of the type system (who wants that in practice…). Maybe we can start with 1) and provide additional options in a separate step.
This could be a language construct like
type of $a
similar to$a instance of <type>
or a functionfn:type-of(item()*)
.It returns the string representation of the type of a value.
map(*)
andarray(*)
respectivelyarray(xs:string)
,map(xs:integer, element(banana))
I wrote an implementation in XQuery as a PoC utilizing typeswitch:
https://github.com/line-o/xbow/commit/ca6e593f869c15b1fb372d24653715abfbda5cf8
There is an implementation in baseX inspect:type.
inspect:type#2
allowing to set additional options which should be considered.