qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Function Coercion: Records, Maps, Arrays #1318

Open ChristianGruen opened 2 months ago

ChristianGruen commented 2 months ago

Conclusion (2024-09-11): As we plan to keep coercion rules for records, we should add rules for arrays and maps as well: If $v as xs:int+ is successful for (1, 2), $v as array(xs:int) should be successful for [ 1, 2 ].


I like to question the coercion rule that encompasses record tests:

  1. If R is a RecordTest and J is a map, then J is converted to a new map as follows:
  • The keys in the supplied map are unchanged.

  • In any map entry whose key is an xs:string equal to the name of one of the field declarations in R, the corresponding value is converted to the required type defined by that field declaration, by applying the coercion rules recursively (but with XPath 1.0 compatibility mode treated as false).

I would like us to drop this rule. I believe that both the instance checks and the conversions of large maps can get very expensive. In addition, it may even require recursive rebuilds of map structures if a supplied record test includes nested record tests.

I think it’s completely fair to expect users to deliver maps in a way that matches record definitions, and it could even be counterintuitive if map updates take place as a consequence of a simple function call.

If we want to stick to that coercion rule, we should also define coercion rules for maps and arrays.

michaelhkay commented 2 months ago

The motivation for this rule was primarily connected with numeric fields. If the required argument to a function is record(longitude as xs:double, latitude as xs:double) then it seems unreasonable not to allow the value {"longitude": 125, "latitude": 53} to be supplied.

Note that we already incur the same costs for large sequences: and records are typically bounded in size by the record definition, whereas sequences are unbounded. And the cost of type-checking is there anyway even if the outcome is to reject the value rather than coercing it.

ChristianGruen commented 2 months ago

it seems unreasonable not to allow the value {"longitude": 125, "latitude": 53} to be supplied.

Following that reasoning, we should definitely include coercion rules for arrays and maps as well.

Note that we already incur the same costs for large sequences: and records are typically bounded in size by the record definition, whereas sequences are unbounded. And the cost of type-checking is there anyway even if the outcome is to reject the value rather than coercing it.

I would agree only partially: Definitions of records can be nested, which makes coercions more expensive if the type of a descendant value does not match. I may be wrong, but I believe it could be confusing that a trivial function call triggers a potentially recursive update of a data structure. Shouldn’t we rather try to define a robust record constructor instead of moving all the complex semantics in the coercion rules?

michaelhkay commented 2 months ago

We could certainly consider doing the coercion in a record constructor rather than on the function call.

michaelhkay commented 2 months ago

PR #1355 adds support for record constructors, and if you use a record constructor then the coercion happens at that point rather than at the point where the map is supplied as an argument to the called function.

The problem is then that the simple syntax for supplying options to a function doesn't work:

fn:collation({"lang":"de", "strength":"primary"})

fails because there is no coercion of "de" to xs:language or of "primary" to the enum type.

We could get round that by reverting to the 3.1 position where coercion of options parameters was part of the option parameter conventions (and therefore the responsibility of the function implementation) rather than part of the calling mechanism.

But that's rather onerous for someone implementing a user-written function with an $options parameter: they have to write a lot of code to do it properly, and the chances are they will get it wrong.

I'm going to defend the status quo, I think. I don't think the performance issues are any worse than other aspects of run-time type checking and conversion, and there are implementation techniques to mitigate the problems.

ChristianGruen commented 2 months ago

I agree; thanks to the proposed introduction of records constructors, I’m optimistic that an implementation can assign record types to maps internally, so that many coercion checks can be dropped at compile time.

I think we should then add coercion rules for arrays and maps as well: If $v as xs:int+ is successful for (1, 2), $v as array(xs:int) should be successful for [ 1, 2 ].