w3c / qt3tests

Tests for XPath and XQuery
27 stars 17 forks source link

MapTest-058 map as function #28

Open benibela opened 4 years ago

benibela commented 4 years ago

MapTest-058 has

 let $s as function(xs:anyAtomicType) as xs:string := map { 1:'A','x':'B' }

But the type of this map is function(xs:anyAtomicType) as xs:string? (2.5.6.2.35) which is not function(xs:anyAtomicType) as xs:string

michaelhkay commented 4 years ago

Yes, I think you are right.

michaelhkay commented 4 years ago

If the test is wrong then Saxon is wrong, so I've raised a Saxon issue at https://saxonica.plan.io/issues/4692

ChristianGruen commented 4 years ago

@benibela Good hint. That's something that also needs to be fixed in BaseX.

michaelhkay commented 4 years ago

I'm puzzling a bit over the transitivity rules. A number of the rules in §2.5.6.2 contain the phrase "because of the transitivity rules", but are the transitivity rules explicit? We obviously expect that subtype(A,B) and subtype(B,C) implies subtype(A,C), but where is this stated? The "if and only if" at the start of §2.5.6.2 doesn't seem to leave much room for inferring the relationship by transitivity.

Now, function(anyAtomicType) as string? is a subtype of function(integer) as string?, so transitivity implies that map(K, string) is also a subtype of function(integer) as string?.

Equally, function(anyAtomicType) as string? is a subtype of function(anyAtomicType) as item()*, so transitivity implies that map(K, string) is also a subtype of function(integer) as item()*.

So I guess we should read rule 35 as "Ai is map(K, V), and Bi is function(xs:anyAtomicType) as V? (or, by the transitivity rules, if Bi is function(P) as W where subtype(P, xs:anyAtomicType) and subtype(V?, W)".

michaelhkay commented 4 years ago

Further thinking on this. It's true that map(anyAtomicType, string) is not a subtype of function(xs:anyAtomicType) as xs:string. But that's not what this test requires. The test requires only that map { 1:'A','x':'B' } is an instance of function(xs:anyAtomicType) as xs:string. The rule for this is §2.5.5.7: "A TypedFunctionTest matches an item if it is a functionDM31 and the function's type signature (as defined in Section 2.8.1 Functions DM31) is a subtype of the TypedFunctionTest." Now, we know that map { 1:'A','x':'B' } is a function, but what is the type signature of this function? Section §2.5.5.8 tells us "The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?.". But the value map { 1:'A','x':'B' } matches many different map types, for example it matches both map(xs:anyAtomicType, xs:string) and map(union(xs:integer, xs:string), item()*). Does this imply that the map has multiple function signatures? As an extreme case, consider the empty map. What is the function signature of an empty map? An empty map matches type map{K, V} for all possible values of K and V, and it seems that each of these map types gives us a different function signature. So talking of "the function signature of a map" really makes no sense; there is no such property. Instead we need to read this rule as "A map M matches a function type F(P) as R if (a) subtype(xs:anyAtomicType, P); (b) every value in the map is an instance of R; and (c) the empty sequence is an instance of R." But this rule is rather different from anything that's explicitly stated in the spec.

michaelhkay commented 4 years ago

See also https://github.com/w3c/qtspecs/issues/14 which I have raised as an issue against the spec.

abelbraaksma commented 4 years ago

but what is the type signature of this function?

The answer is in F&O, section 17.1:

"A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*."

I've struggled with this as well. This rule is not in XP31, but probably should be.

Does this imply that the map has multiple function signatures?

No, only one.

As an extreme case, consider the empty map. What is the function signature of an empty map?

The same.

"A map M matches a function type F(P) as R if (a) subtype(xs:anyAtomicType, P); (b) every value in the map is an instance of R; and (c) the empty sequence is an instance of R."

I think this conclusion is correct, but only if R is taken as item()*.

This makes sense, because a map can take any item as value, and any atomic type as key. It's irrelevant if the type of the map itself is more specific. The function signature is always the same.

It follows that the test is invalid.

Edit: rule 30 in XP31 says so much, note "is" in the sentence:

"Ai is map(*) (or, because of the transitivity rules, any other map type), and Bi is function(xs:anyAtomicType) as item()*."

michaelhkay commented 4 years ago

Thanks Abel, for the reference to F&O. The statement you quote conflicts with XP31 §2.5.5.8 which says: "The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?. "; but it agrees with what I'm proposing in https://github.com/w3c/qtspecs/issues/14.

The test as written is incorrect, though, because $s(42) returns (), which is not permitted by the function signature, therefore the map cannot be an instance of this function type.

abelbraaksma commented 4 years ago

Yes, I agree, I was too quick in saying it was valid, forgot the type arity, and meanwhile edited.

abelbraaksma commented 4 years ago

Indeed, it conflicts with 2.5.5.8. The other rules in both specs seem correct though (specifically 2.5.6.2, item 30).

abelbraaksma commented 4 years ago

Actually, I think I'm confused on the covariance of the return type. If the target type returns string (or int, or any other more specific type for that matter), surely that should be wrong?

Meaning, I think the return type of the target function only matches if it's item()*.

abelbraaksma commented 4 years ago

Conversely, this also means that the argument of the target function can be more specific. function(int) as item()* would be valid (though it'll always return the empty sequence).

abelbraaksma commented 4 years ago

@michaelhkay, if I look at your fixes, it looks like you apply contravariance to both the argument and the return type, but only the argument type is contravariant. Just like other functions, the return type should be covariant.

I think we're confusing function coercion (https://www.w3.org/TR/xpath-31/#id-function-coercion) with instance-of relations here, and as such, I think I've misunderstood part of the discussion. Re-reading 2.5.5.8 seems true after all (though confusing), as it mentions function-coercion as a potential surprise in the last example, and it explicitly states that a map "is a" function(xs:anyAtomicType) as item()*.

Moreover, it has this example, where $M := map{0:"no", 1:"yes"}:

not($M instance of function(xs:integer) as xs:string) returns true()

In short, I think the rules for sequence type matching where the target is a function and the source is a map, are:

This does not apply where a function is expected as an argument to another function, in which case function coercion takes place, and the return type may be a different type, leading potentially to runtime type errors. Same is true, for instance, in XSLT with xsl:variable, where function-coercion applies, and not instance-of.

In OP's example, the function-coercion rules do not apply (3.12.1 of XQ31). This means that a type error should be raised.

Going over the suggested changes in https://github.com/w3c/qt3tests/commit/c77205f5a02d07c8b91188ac13bba71b49dfa4c0, I don't think these are all correct:

let $s as function(xs:anyAtomicType) as xs:string? := map { 1:'A','x':'B' } return $s?*
--> should raise XPTY0004 (but test accepts it)

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string?
--> false (but test says true)

map { 1:'A','x':'B' } instance of function(xs:integer) as item()*
--> true (test is correct)

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string
--> false (test is correct)

map { 1:'A','x':'B' } instance of function(xs:integer) as item()+
--> false (test is correct)

map {} instance of function(xs:integer) as empty-sequence()
--> false (but test says true)

map {12: ()} instance of function(xs:decimal) as empty-sequence()
--> false (but test says true)

map {12: ()} instance of function(xs:decimal) as xs:string*
--> false (but test says true)

map {12: "z"} instance of function(xs:decimal) as xs:string
--> false (test is correct)
faassen commented 11 months ago

I am implementing this behavior and was puzzled over these test failures in the test suite:

MapTest-059

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string?
--> false (but test says true)

MapTest-063

map {} instance of function(xs:integer) as empty-sequence()
--> false (but test says true)

MapTest-064

map {12: ()} instance of function(xs:decimal) as empty-sequence()
--> false (but test says true)

MapTest-065

map {12: ()} instance of function(xs:decimal) as xs:string*
--> false (but test says true)

Turns out this behavior matches what @abelbraaksma said. So my reading of the spec is the same as the one by @abelbraaksma three years ago, which perhaps counts for something.

faassen commented 11 months ago

In map/entry.xml, map-entry-001-hof has this:

<test>map:entry(3, 5)</test>

And this as one of the all-of assertions:

 <assert-type>function(xs:anyAtomicType) as xs:integer?</assert-type>

If this is to be read as:

map:entry(3, 5) instance of function(xs:anyAtomicType) as xs:integer?

then in my reading of the spec this should be false, but the test asserts that it is true.

michaelhkay commented 11 months ago

Looking at MapTest-059:

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string?

I believe the result is true: informally, because you can call this map as a function supplying any xs:integer as the argument, and the result will always be either a string, or an empty sequence.

Finding justification for this in the specs is more difficult, and there are some known problems with the 3.1 specs in this area. See for example https://github.com/qt4cg/qtspecs/issues/18 .

But I think the XPath 3.1 spec in §2.5.5.8 has the clear (and correct) statement: The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?

Note that this only depends on V, and not on K. That's because map:get($M, $value) allows $value to be any atomic type, and therefore $M($value) also allows $value to be any atomic type: it's the type of values you can supply as an argument to the function call that matter, and that's unrelated to what keys are actually present in the map.

So the map in this example matches the type map(K, xs:string) and therefore it has the signature function(xs:anyAtomicType) as xs:string?, and under the contravariance rules, this is a subtype of function(xs:integer) as xs:string?.

Rule 35 of XPath §2.5.6.2 also states that map(K, V) is a subtype of function(xs:anyAtomicType) as V?, while Rule 26 has the consequence that function(xs:anyAtomicType) as V? is a subtype of function(xs:integer) as V?

I think the same reasoning applies to the other examples, and the examples are correct. But you're right that it's a confusing area and there are some muddled statements in the specs (which we have tried to improve in 4.0.)

Unfortunately you haven't really explained why you think the test results are wrong. If you can explain your chain of reasoning from the specs, that might help us to make further clarifications.

faassen commented 11 months ago

3.11.2:

If $map is a map and $key is a key, then $map($key) is equivalent to map:get($map, $key). The semantics of such a function call are formally defined in Section 17.1.6 map:get```

If $map($key) is understood to be equivalent to map:get, then the signature would have to be:

fn:map has this signature:

map:get($map as map(*), $key as xs:anyAtomicType) as item()*

So the equivalent signature for the map function would be:

function($key as xs:anyAtomicType) as item()*

2.5.5.8 Map Test

Because of the rules for subtyping of function types according to their signature, it follows that the item type function(A) as item()*, where A is an atomic type, also matches any map, regardless of the type of the keys actually found in the map.

That matches, given the above signature and contravariant argument types.

Now we get to the tricky part:

The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?. It is thus always a subtype of function(xs:anyAtomicType) as item()* regardless of the actual types of the keys and values in the map. The rules for function coercion mean that any map can be supplied as a value in a context where the required type has a more specific return type, such as function(xs:anyAtomicType) as xs:integer, even when the map does not match in the sense required to satisfy the instance of operator. In such cases, a type error will only occur if an actual call on the map (treated as a function) returns a value that is not an instance of the required return type.

This part I find hard to understand:

map(K, V), where V is a sequence type (which could itself have any occurrence) what does V? mean? How do you apply ? to a sequence type?

Then it mentions function(xs:anyAtomicType) as item()*, which is the same as the function type I derived above. It's indeed true that due to covariant return values, a function that returns anything at all is always a subtype of this.

Then there's talk about the rules for function coercion, which essentially means that we can ignore the function type for the purposes of passing it into a function - only instance of should do this type of checking.

It then says that a type error will occur if you call $map and it returns a value that's not an instance of the required return type - that's true - that's because the wrapper generated by function coercion calls the real stated signature, and checks the type of the return value. This is actually quite interesting - I had struggled to find a test case where the function coercion wrapping rules have any effect at all, but this appears to be one of them. But that's unrelated to the current discussion.

I tried a different interpretation - given a map(K, V) derive a function function(K) as V. But that doesn't work either: the previously mentioned rule:

the item type function(A) as item()*, where A is an atomic type, also matches any map, regardless of the type of the keys actually found in the map.

wouldn't hold anymore as contravariance would make K too strict; this rules state any function, say, function(xs:integer) as item()* should match any map, even a map keyed by strings.

It seems odd to have to derive the function return value from the map signature, but not the argument, which is I believe is your interpretation. If we interpret the spec this way, what does the ? mean?

michaelhkay commented 11 months ago

It seems odd.

True. It is odd. It happens because the map-as-a-function returns () rather than throwing a type error if you supply a key of "the wrong type".

where V is a sequence type (which could itself have any occurrence) what does V? mean? How do you apply ? to a sequence type?

That's an excellent point. This is a bug which is still present in the spec. What it should say is not V?, rather the union of V and empty-sequence(). That needs to be fixed.

Then there's talk about the rules for function coercion

Function coercion muddies the waters. These tests use instance of, and function coercion doesn't affect the result of an instance of test. Function coercion makes a value $V acceptable where the required type is T even though $V instance of T is false.

faassen commented 11 months ago

I have this question:

Looking at MapTest-059:

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string?

I believe the result is true: informally, because you can call this map as a function supplying any xs:integer as the argument, and the result will always be either a string, or an empty sequence.

Nothing in the types states this map will always return a string or an empty sequence - that's deduced from its contents, but there's no type information that says so here, is there? We'd have to derive the type map(xs:anyAtomicType, xs:string) from the function, and then check whether this holds, but nothing in the rules of instance of for functions says we have to do that, does it?

michaelhkay commented 11 months ago

A map (like a sequence) does not have an intrinsic type: the result of the expression $M instance of map(K, V) is indeed obtained by examining the contents. The same is true for the expression $M instance of function(P) as Q.

faassen commented 11 months ago

The history of this text (of which you are aware) is interesting:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29586

It used to read:

The function signature of the map, treated as a function, is always function(xs:anyAtomicType) as item()*, regardless of the actual types of the keys and values in the map.

This matches my current interpretation based on other evidence in the spec.

This means that a function item type with a more specific return type, such as function(xs:anyAtomicType) as xs:integer, does not match a map in the sense required to satisfy the instance of operator.

This follows: item()* is not a subtype of xs:integer.

Then it said:

However, the rules for function coercion mean that any map can be supplied as a value in a context where such a type is the required type

Notice the word supplied; it's now talking about function arguments. And indeed we can't supply something that's not a subtype as an argument.

Now it reads:

The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?. It is thus always a subtype of function(xs:anyAtomicType) as item()* regardless of the actual types of the keys and values in the map.

And that's the interpretation we're stumbling over.

Consider this part of the spec in 3.1.5.3:

let $m := map { "Monday" : true(), "Wednesday" : true(), "Friday" : true(), "Saturday" : false(), "Sunday" : false() },

The map $m has a function signature of function(xs:anyAtomicType) as item()*.

This seems to be consistent with the original text but not the new text, as this map matches Map(xs:string, xs:boolean), so by the rule above this map should have the signature function(xs:anyAtomicType) as xs:boolean?.

It goes on to say:

1 The map $m is treated as function ($f), equivalent to map:get($m,?).

which seems to lend weight to this interpretation, as map:get($map,?) has the signature function(xs:anyAtomicType) as item()*.

michaelhkay commented 11 months ago

Yes, there was a lot of debate about it because different parts of the specification were inconsistent and we brought them into line.

faassen commented 11 months ago

Yes, there was a lot of debate about it because different parts of the specification were inconsistent and we brought them into line.

While the change might have been useful for other reasons, I don't agree the specification has become more consistent with this change. A line like:

The map $m has a function signature of function(xs:anyAtomicType) as item()*.

makes me conclude map has that signature, and not the signature function(xs:anyAtomicType) as xs:boolean? as is actually prescribed by the modified version of 2.5.5.8.

faassen commented 11 months ago

You already said most of this earlier, but I needed to construct this reasoning for myself. Considering

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string?

You say it should evaluate to true by examining its contents, and I said false.

In 2.5.5.7 Function Test it refers to the subtype relationship.

A TypedFunctionTest matches an item if it is a function and the function's type signature (as defined in Section 2.8.1 Functions) is a subtype of the TypedFunctionTest

map in this context is taken as a function. So we need to check its type signature. So we go to its definition in 2.8.1:

signature (a FunctionTest of the form Annotation* TypedFunctionTest): The TypedFunctionTest has one SequenceType for each parameter, and one SequenceType for the function's result

So what's the signature of a map? I constructed it as fixed based on other evidence in the spec like map:get and the text in 3.1.5.3, but we go to the changed information in 2.5.5.8:

The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?.

which follows your line of argumentation.

Now on how to implement it. In my implementation it would be tricky to derive the function signature from an arbitrary map - it would require logic that examines all the value sequence types and combines them into a new sequence type that unifies them all, logic that's not required elsewhere. Instead, we can interpret function(xs:anyAtomicType) as V? as a map test, and execute the map test.

How should this be implemented? I will go into this next.

faassen commented 11 months ago

We can interpret function(xs:anyAtomicType) as V? as a map test, and execute the map test.

What map test?

The ? makes things tricky. Which map test applies?

Map(xs:anyAtomicType, V)

or

Map(xs:anyAtomicType, V?)

Let's examine Maptest-059:

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string?

which evaluates to true.

and MapTest-061:

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string

which evaluates to false according to the test suite.

So it would appear we need to strip the ?, otherwise this would evaluate to true.

This rule implies that empty sequences aren't allowed. We don't have a test for it (and we should), but in Saxon:

map { 1: ()} instance of function(xs:integer) as xs:string?

is also true, so that "stripping the ?" interpretation doesn't seem correct.

Here is MapTest-064:

map {12: ()} instance of function(xs:decimal) as empty-sequence()

which is true

And MapTest-066:

map {12: ()} instance of function(xs:decimal) as xs:string*

which is also true.

Let's try xs:string+ in saxon:

map { 1:'A','x':'B' } instance of function(xs:integer) as xs:string+

this evaluates to false.

This leads to the following rules to turn a return value into a map test:

V - no map test is possible, never matches. V+ - no map test is possible, never matches. (according to Saxon) V? - map test Map(xs:anyAtomicType, V?). (unclear from test suite whether empty sequences match, but with Saxon it does) V* - map test Map(xs:anyAtomicType, V*) empty-sequence() - map test Map(xs:anyAtomicType, empty-sequence())

I'd feel more comfortable if I could construct this rule from the specification instead of from the test suite and Saxon's behavior, but I struggle to reason about this.