Better support for typed maps

ChristianGruen commented 8 months ago

Edit (2023-01-04): See https://github.com/qt4cg/qtspecs/issues/917#issuecomment-1875712638 for the most promising suggestion resulting from the discussion in this thread.

Inspired by #720 and concerns regarding usability and performance, it may be a big step, but couldn’t we define records as subtypes of maps?

The main difference would be that updates on records are only allowed as long as the resulting map matches a record definition.
This would allow us to return much better error messages, and to prevent users from deconstructing their own data structures.
We could still benefit from the existing map functions… provided that we believe it's an advantage. A stricter solution would be to disallow optional map entries completely (and treating records as a separate type).
From a technical point of view, data with a fixed structure can be optimized much better than a structure that changes dynamically.

michaelhkay commented 8 months ago

Surely it's not a subtype if it's not substitutable for the supertype, that is if it can't be used as an argument to functions such as map:put()?

The optimisation point is an interesting one. Javascript seems to manage OK, it's not noted for being slow.

I'm very reluctant to introduce data model changes or extensions to the type system, as experience shows it takes years to work through all the implications. Now, if we could introduce new kinds of value without a change to the type system, that would be a different matter... But making the whole language fully object-oriented is just too big a step.

I think that with named item types it should be easy enough to persuade users to declare the type of map/record that's expected at function boundaries, and that's sufficient (a) to catch user errors early, and (b) to allow the system to choose a map representation that's efficient for the type of access being performed.

A record constructor as an alternative to a map constructor might well make life a little bit easier both for implementors and for users, and that can be done without a data model or type system change.

ChristianGruen commented 8 months ago

Surely it's not a subtype if it's not substitutable for the supertype, that is if it can't be used as an argument to functions such as map:put()?

map:put could be used indeed, but the result would be rejected if it doesn’t match the record definition of the input.

I think of records as maps with frozen entries. I think everyone agrees that records are a big step forward, but it seems pretty erratic and unpredictable that you can destroy a record by a single update operation, without even noticing.

The optimisation point is an interesting one. Javascript seems to manage OK, it's not noted for being slow.

One advantage of JavaScript is that it has very basic data structures (…one byte is needed to represent a byte). Next, billions have been spent to make JS as fast as it is today…

If I could, I would focus on getting our languages faster, possibly by the help of lightweight data structures, instead of adding more sugar. The raytracer code is a good example to demonstrate that all existing implementations of XQuery are… slow. Of course, one fundamental property of functional languages that affects performance is that each update is a copy operation; this won’t change as long as we don’t introduce mutable objects (which would lead to a completely new language).

I'm very reluctant to introduce data model changes or extensions to the type system, as experience shows it takes years to work through all the implications.

I share your concerns. I wound hope that a record subtype would be much easier to specify than a completely new type for objects (otherwise, the latter would certainly be my favorite). It could also be the naivete of someone who hasn’t disclosed the gory details yet.

michaelhkay commented 8 months ago

Thinking around this, suppose we scrap "declare item type" in favour of "declare record type" on the grounds that (a) most of the use cases for named item types are actually for named record types, and (b) we already want to special-case record types by allowing them to be recursive. And then, declare record type automatically declares a constructor function of the same name, perhaps with defaults for optional fields. Then any instance of that record type is most likely to be constructed using that constructor function; in which case we automatically know its static type; and it's not hard for the implementation to use a representation of the record in which fields are allocated fixed positions; and any lookup expression on a value with known static type and with a statically known key specifier can compile down to a fixed-position accessor. This all seems quite feasible, it's mainly a question of designing things so users naturally use the constructs that are most efficient.

Perhaps, along these lines, we should also have custom syntax for doing map:put in a way that constrains the type of the result. Something like

declare record type geo:position(latitude as xs:double, longitude as xs:double);
let $pos := geo:position(123.8, 57.2)...
set $pos as geo:position {longitude := 59.6...}

where set $var as type {field := value} is shorthand for let $var as type := map:put($var, field, value) treat as type

michaelhkay commented 8 months ago

without even noticing

I think the fact that you have to use functions in the map namespace to achieve this gives a strong hint.

But one of the aims of the "methods" proposal is to provide a more disciplined way of creating modified data in a type-safe way, as the "?resize()" example attempts to demonstrate.

It may be worth revisiting issue #220 (Encapsulation) to see whether it can be made to work with records and methods.

Of course, one fundamental property of functional languages that affects performance is that each update is a copy operation;

Despite strenuous efforts, I haven't found a way to avoid copying node trees, because of the problem of node identity, but for maps and arrays, "immutable/persistent" data structures (I dislike both terms) mean that changes can easily be made without copying the parts of the structure that don't change.

ChristianGruen commented 8 months ago

Thinking around this, suppose we scrap "declare item type" in favour of "declare record type" […]

Yes, I believe that would be a great step in a good direction.

we should also have custom syntax for doing map:put in a way that constrains the type of the result.

Another lightweight option could be a map:update function that preserves the record type by only accepting existing keys, and values with a type that equals the type of the existing value… whatever equality means in this context. A restriction to atomic values might be helpful, too, to avoid people updating functions (if functions of records are static, they could possibly be inlined at compile time). As you’ve already stated, the challenge will be to motivate users to do what we want them to do. Once a processor is not sure of the static type anymore, the optimizations will fall apart.

for maps and arrays, "immutable/persistent" data structures (I dislike both terms) mean that changes can easily be made without copying the parts of the structure that don't change.

Yes, it's impressive how efficient immutable data structure are (sometimes they even outperform mutable concepts). We’ll never beat direct memory updates; but of course, the goal has never been to compete with language like Rust, C or ASM.

ChristianGruen commented 8 months ago

I withdraw map:update. It’s not very generic, and names like put or update imply that data is changed. Instead, we always create something new (merge and build are better names for that). We should rather motivate people to use record constructors to update existing records.

Maybe the record constructor could take keyword arguments and interpret an optional single argument without keyword as the record to be updated:

declare record geo:coord(
  x  as xs:double,
  y  as xs:double,
  z  as xs:double := 0,
  final zero := fn() { $this?x = 0 and $this?y = 0 and $this?z = 0 }
);
let $pos := geo:coord(x := 3, y := 5)
return geo:coord($pos, z := 7)

$this could be attached to all functions in a record definition. Again, I would prefer the variable to be activated right after the record construction. This would speed up queries like the one for products and odd/even in #916 (the binding would only need to be done once).

In addition, a final keyword could indicate that the function must not be overwritten by the caller of the constructor (final param-name without a default value would be invalid).

michaelhkay commented 8 months ago

(Incidentally, I took a glance at the raytracer query, which I have never studied, and I suspect it could be made to go much faster using maps and arrays in place of node trees. Thats partly because of the cost of node construction and copying, and partly because it's doing a lot of string-to-double conversion.)

ChristianGruen commented 8 months ago

(Incidentally, I took a glance at the raytracer query, […]

Yes, it would be interesting to see a 3.1/4.0 update of the code. All I did was apply little changes that made it about 20% faster (in two processors that I tested, I can't recollect their names ;-).

dnovatchev commented 8 months ago

I think of records as maps with frozen entries. I think everyone agrees that records are a big step forward, but it seems pretty erratic and unpredictable that you can destroy a record by a single update operation, without even noticing.

I am discovering this this fascinating discussion only now ... 😢

How can something that is immutable be "destroyed"?

I have always regarded records as what in modern programming languages are interfaces. Thus, looking at records as a subtype of maps gives us nothing. It is synonymous of saying that everything is a map (object), which , again gives us nothing new - that we haven't already known about.

On the other side, having what essentially is corresponding to interface in other languages gives us huge new capabilities.

We can speak of an object implementing an interface (a map containing all fields of a record), and have the same duck typing as in Python.

It is only coincidental that a record is defined as a restricted map (as in other languages, everything is a kind of object).

ChristianGruen commented 8 months ago

@dnovatchev If you implement an interface, and if this interface defines a method, your programming language usually won't allow you to remove the implementation of that method from your object. With the current definition of records in our languages, that's easily possible. For example,

take a record that defines a single function calc.
$record := map { 'calc': sum#1 } matches the record definition.
If map:remove($record, 'calc') is called, the result won’t match the record definition anymore.

I think we should do our best to disallow such cases. By defining records as subtypes of maps, we could reject updates on maps that would violate the definition of the input record.

The idea has similarities with the suggestion to define arrays as subtypes of maps (https://github.com/qt4cg/qtspecs/issues/298#issuecomment-1802673000): map:remove($array,-1) would then be a legal function call, but it would always be rejected as arrays have no entries at position -1.

dnovatchev commented 8 months ago

@dnovatchev If you implement an interface, and if this interface defines a method, your programming language usually won't allow you to remove the implementation of that method from your object. With the current definition of records in our languages, that's easily possible.

All items in XPath are immutable. It is not possible to "destroy", add or remove key-value pairs of a map instance. Trying to do so creates another map instance. The original record instance remains unchanged.

For example,

take a record that defines a single function calc.

$record := map { 'calc': sum#1 } matches the record definition.

If map:remove($record, 'calc') is called, the result won’t match the record definition anymore.

I think we should do our best to disallow such cases. By defining records as subtypes of maps, we could reject updates on maps that would violate the definition of the input record.

I don't see any need to do this, Records are Records, Maps are Maps.

If we did really have such a problem in XPath, we would probably have proposed a const construct that marks a particular item as immutable - as many programming languages do. But because everything is immutable in XPath, we don't have this problem at all, this is why we don't need any const keyword.

The idea has similarities with the suggestion to define arrays as subtypes of maps (#298 (comment)): map:remove($array,-1) would then be a legal function call, but it would always be rejected as arrays have no entries at position -1.

I don't feel enthusiastic about this either. If Map is a synonym of Object, saying that arrays (specifically) are maps is like saying once again that everything is an object -- we already know this.

Finally, even if we are in a typical OOP PL whose instances are not immutable, the whole idea of having a subtype that prohibits a method from the base class is wrong:

This would violate the Liskov principle (the "L" in SOLID).

See this said by Jon Skeet((here: https://stackoverflow.com/a/2779195/36305))

ChristianGruen commented 8 months ago

All items in XPath are immutable. It is not possible to "destroy", add or remove key-value pairs of a map instance. Trying to do so creates another map instance. The original record instance remains unchanged.

That’s obvious. We do have functions like map:remove and map:put for creating modified copies of maps. Obviously, these functions can also be called to create copies of records (as every map may match several records definitions). The results will always be maps, but not necessarily match the record type of the original input. If records were subtypes of maps, we could ensure that each copy of a record will again be a valid record. In other words, we would greet the guarantees that are self-evident for maps, arrays and other types.

If we did really have such a problem in XPath, we would probably have proposed a const construct that marks a particular item as immutable

The problem as stated is introduced with records, as they are currently defined. It did not exist before 4.0, because it does not apply to maps, arrays or other types.

Finally, even if we are in a typical OOP PL whose instances are not immutable, the whole idea of having a subtype that prohibits a method from the base class is wrong:

It seems to me you mix up several things here. No one wants to prohibit methods from a base class. Your SO link says:

Q: Is there way for a class to 'remove' methods that it has inherited? A: No - this would violate Liskov's Substitution Principle

My intent is exactly to prevent users from removing a function (in the resulting copy) unless it’s part of the record definition.

Next, if we reject a copy if it does not match the definition of a record (as I’ve recommended above) would be a postcondition. Postconditions may be strengthened, but not weakened, according to Liskov (see https://en.wikipedia.org/wiki/Liskov_substitution_principle). With OOL, it’s fairly common to define stricter postconditions in overwritten methods.

dnovatchev commented 8 months ago

Next, if we reject a copy if it does not match the definition of a record (as I’ve recommended above) would be a postcondition. Postconditions may be strengthened, but not weakened, according to Liskov (see https://en.wikipedia.org/wiki/Liskov_substitution_principle). With OOL, it’s fairly common to define stricter postconditions in overwritten methods.

At he same Wikipedia article it is said:

"Preconditions cannot be strengthened in the subtype."

A precondition for the Map (base type) is that any set of key-values is allowed.

But this is strengthened in a Record type, which requires only sets of key-values that include a specific subset of string-keys.

Thus, any attempt to make Record to be a a subtype of Map violates the above rule, as it requires strengthening of the preconditions of the base class (Map).

ChristianGruen commented 8 months ago

A precondition for the Map (base type) is that any set of key-values is allowed.

So what about the XDM type xs:byte, which is a subtype of xs:integer? Aren’t bytes strengthened variants of integers? How would be the postcondition of that type differ from the precondition?

I’m not aware that pre-/post-conditions apply to the type; I associated them with methods/functions.

dnovatchev commented 8 months ago

A precondition for the Map (base type) is that any set of key-values is allowed.

So what about the XDM type xs:byte, which is a subtype of xs:integer? Aren’t bytes strengthened variants of integers? How would be the postcondition of that type differ from the precondition?

I’m not aware that pre-/post-conditions apply to the type; I associated them with methods/functions.

I think we are diverging significantly from the main topic here.

As a user, I don't care about implementation details.
As a user, I will probably never create new instances by programmatically updating a map instance that happens to be a record instance.
As a user, I also know that applying any (updating/constructing) function on a map instance does not modify the original - it creates a new instance.
As a user, I know that record is only a typename and it is not a new/separate type. We don't have a set of functions associated with just a typename and we cannot expect that an existing function that updates the original type would produce an instance that would not only be an instance of the original type, but will also be matched by the typename definition.

Of course, if someone wants this badly enough, they could introduce (yet another!) namespace, and place only record-specific functions there. Say: rec:update().

Or, with the cost of added complexity, we can have a new overload of map:put that accepts a boolean parameter/argument named, say, preserve-record, and default value of false.

Once again, as a user I don't feel any need either for creating such a separate namespace with record-specific functions or for creating a new overload of map:put.

ChristianGruen commented 8 months ago

As a user, I will probably never create new instances by programmatically updating a map instance that happens to be a record instance.

Maybe you as a user won’t, but other users will. Check out the examples in this and related GitHub issues, in which exactly this challenge is discussed (e.g. #720/#916: $this could always point to the original map if we didn't need to consider updates).

If you think the current definition of records is perfect as it is, that's just fine. If you don't, it would be helpful to hear some constructive feedback on what you would change.

dnovatchev commented 8 months ago

If you think the current definition of records is perfect as it is, that's just fine. If you don't, it would be helpful to hear some constructive feedback on what you would change.

'record' is just a typename.

A Typename is not a new type. It is a shorthand.

If we start creating new types for some typenames, we would have a lot of (probably not pressingly necessary) new stuff, that would be inevitably biased (based on subjective preferences).

And if we create new types, then why do we need typenames at all?

I believe in step-by-step progress, and the first step here are the typenames. Based on sufficient, accumulated objective data of using the typenames after a considerable period from their introduction, then we could probably consider making the most popular typenames into separate types - if we do decide so,

ChristianGruen commented 8 months ago

'record' is just a typename. […]

Exactly, that’s what it currently is.

To follow on from the original considerations: Another solution to increase type-safety could be to use record constructors to create typed maps: The result will be a classical map, but with a record definition attached. Whenever an update operation is performed on a typed map (such as map:put), the record definition will be adopted to the modified copy of the map, and a resulting map will be rejected if it does not match the record definition. Given the following record construction…

declare record geo:coord(
  x  as xs:double,
  y  as xs:double
);
let $pos := geo:coord(3, 5)

…map:put($pos, 'x', 7) would be legal, whereas map:put($pos, 'x', 'seven') or map:remove($pos, 'x') would be rejected. No matter which update is performed on this map, a processor could always statically assess that the resulting copy still matches the original record definition.

MarkNicholls commented 8 months ago

Ok, for a different perspective, which may or may not be helpful (feel free to ignore), I don't completely understand the original motivation but i do broadly understand examples (even if I'm inventing in my head what 'record' means).

To me, this is more like a 'state monad' than a subtype.

If you don't know what one is, they are a constraint on the type of the function signature 'put', and 'remove'.

They are used mostly in pure(ish) languages to simulate imperative style semantics, which I think is sort of what you are getting at?

i.e. in an imperative OO language, your 'remove' example breaks the contract because in OO all 'state changes' must be consistent the the contract on the type...i.e. a geo:coord is always a geo:coord, no matter what you do to it.

In a pure language this isnt a problem except, sometimes its quite nice and convenient to have this sort of model (especially in a world of OO programmers), i.e. you want the constraint to try to give some structure to your code.

So....say...."I want 'get" to be a state monad, and promise to return me a new geo:coord and the answer"

i.e. in psuedo fp get :: string -> geo:coord -> (integer * geo:coord)

or in psuedo xpath

function(xs:string,get:coord) as <xs:integer,geo:coord> (its not clear to me what a sensible syntax for a tuple is...so I've invented '<..>')

"remove" clearly can't promise to return a new geo:coord, so it can't have this signature.

similarly "put" also doesnt unless you are overwriting x or y....(actually i dont know if this is a run time error, lets assume not). so

putX($x as xs:integer,$coord as geo:coord) as <(),geo:coord>

I'm not clear if this is helpful at all...I'm not suggesting making state monads part of XPath etc (though like the LINQ conversaton you could drive them from XQuery), I think most people would cry.

but it may be worth applying in some way, it may better fit.

michaelhkay commented 8 months ago

My interpretation of this would be in terms of generic types: you want a signature of map:put such that map:put($map as M, $k, $v) is guaranteed to return a map of type M. I can see the attraction but it's a lot of change for the type system and a tough challenge for backwards compatibility.

ChristianGruen commented 8 months ago

Ok, for a different perspective, which may or may not be helpful (feel free to ignore), I don't completely understand the original motivation but i do broadly understand examples (even if I'm inventing in my head what 'record' means).

@MarkNicholls Thanks for the feedback. I invite you to check out the current XPath 4.0 draft to learn more about records, record types and record tests: https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-record-test

If a map $map matches the record type record(n as xs:integer), your map contains a single key n with an integer as value. When you perform an update on that map, the result may or may not match the original record type. If we could assure that the record type will not be changed after an update, we would have better type safety, which would have numerous advantages (better error messages, compact and static memory layout for record-based maps, faster updates and so on).

I can see the attraction but it's a lot of change for the type system and a tough challenge for backwards compatibility.

@michaelhkay I think the suggestion from my last comment would be the most promising one, as it’s much more light-weight than an independent object type, or a record subtype: We could introduce a record constructor, which creates a map that has the record type information attached. Whenever a new map is created from that map, we would simply need to ensure that the new map still matches this type.

dnovatchev commented 8 months ago

If a map $map matches the record type record(n as xs:integer)

No, record is not a type - just a type name.

MarkNicholls commented 8 months ago

It reminds me of C# and 'async'...which does something very very similar (though personally I'm not a fan...thats not the point).

They wanted to secretly include an async monad (same pattern different application).

https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/async

there you annotate the signature of the function to tell the compiler its 'special'

public async Task<int> ExampleMethodAsync()
{
    //...
}

and

string contents = await httpClient.GetStringAsync(requestUrl);

the compiler then secretly injects the monadic code to do the magic they didnt want to infect the language with (probably because LINQ doesnt really inherently handle exceptions, and they wanted something bullet proof).

how that translates into xpath i dont know, technically something higher order like (completely made up)

put($key,$value) as state<geo:coord, unit>

and some magic operator..

let $newRecord .=. put('key','value')($mycoord) ew...nasty....

or explicitly

let $newRecord .=. runstate($mycoord, put('key','value')(mycoord))?state and to get values out

let $newRecord .=. runstate($mycoord, put('key','value')(mycoord))?value ugly ugly ugly.

(i think, its been a while) scala does something 'clever' and overloads 'foreach' to deference monads (as C# does with 'for x in y' and LINQ, they chose to not do so in this case), the trick makes syntactical sense because evaluating sequences is also monadic.

in which case

for $newCoord in put('key','value')(mycoord) then you're 50% of the way to full blown LINQ/do notation.

if you want, for example, to add async, maybe, stacks or random etc support, you just hijack the same syntax.

dnovatchev commented 8 months ago

My interpretation of this would be in terms of generic types: you want a signature of map:put such that map:put($map as M, $k, $v) is guaranteed to return a map of type M. I can see the attraction but it's a lot of change for the type system and a tough challenge for backwards compatibility.

While I can feel the "convenience" of this, it also poses unwanted restrictions.

For example, one may want to construct a record "a piece at a time" and during putting each piece together the result will not be of the required type (yet). Or, one might want to incrementally construct one record structure from a record that has a different record-structure. With the restricted put() such freedom is difficult or prohibited.

ChristianGruen commented 8 months ago

If a map $map matches the record type record(n as xs:integer)

No, record is not a type - just a type name.

Feel free to fix this in the spec (you'll find multiple occurrences of “record type” in the text).

dnovatchev commented 8 months ago

If a map $map matches the record type record(n as xs:integer)

No, record is not a type - just a type name.

Feel free to fix this in the spec (you'll find multiple occurrences of “record type” in the text).

Well, it is best that the original author(s) correct their text.

ChristianGruen commented 8 months ago

They wanted to secretly include an async monad (same pattern different application).

https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/async

I have trouble to understand the relationship: Do you refer to the existing record rules, or the suggestion to embed record definitions in a map?

MarkNicholls commented 8 months ago

My interpretation of this would be in terms of generic types: you want a signature of map:put such that map:put($map as M, $k, $v) is guaranteed to return a map of type M. I can see the attraction but it's a lot of change for the type system and a tough challenge for backwards compatibility.

While I can feel the "convenience" of this, it also poses unwanted restrictions.

For example, one may want to construct a record "a piece at a time" and during putting each piece together the result will not be of the required type (yet). Or, one might want to incrementally construct one record structure from a record that has a different record-structure. With the restricted put() such freedom is difficult or prohibited.

this reminds me of F# and their record syntax,

you create a record

type Foo = { name : string; age : int } you 'put' by going (this DOES create a new value)

`let newFoo = { foo with name = 'jeremy' }

but you arent allowed to 'extend' the type, i.e. this is a type error

`let newBar = { foo with height = 155 }

` in practice i get irritated with that about once in two years (and this is a language i use), but maybe not in the same data oriented contexts.

if you can map a record to a map, and a map to a record though you then CAN then go.

toGeoCoord(put(toMap foo,'key','value')) then you get the best of both worlds

michaelhkay commented 8 months ago

A possible way forward might be:

A map MAY have a type annotation, which is an item type (typically a record type or map type). If a map M has a type annotation T, then M instance of T is guaranteed to be true. In addition, if a map M has a type annotation T, then the result of map:put() or map:remove() applied to M must also be an instance of T (otherwise the remove or put operation fails with a type error).

A record constructor (details TBA) always constructs a map having a type annotation with the corresponding record type.

In addition, maps with type annotations may be created using the constructs

map (K, V) {key:value, key:value, ...}

for example map(xs:integer, xs:string){1:'yes', 2:'no'}

or in XSLT

<xsl:map as="T">...</xsl:map>

A further tweak: if a map has a type annotation with key type K, then the keys are effectively upcast to type K -- the map is no longer obliged to retain the actual subtype of each key value. (Retaining the subtype of each key value imposes an overhead which hardly ever has any value for users.)

ChristianGruen commented 8 months ago

Well, it is best that the original author(s) correct their text.

Possibly yes, if there should be agreement to do so. In this thread, I’ll tend to use the wording from the spec until we decide to rephrase it.

ChristianGruen commented 8 months ago

A possible way forward might be: […]

Thanks; that's a perfect summary of what I think we should add.

MarkNicholls commented 8 months ago

They wanted to secretly include an async monad (same pattern different application). https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/async

I have trouble to understand the relationship: Do you refer to the existing record rules, or the suggestion to embed record definitions in a map?

ah....thats because its not at all obvious....it was really as an example of an existing language extends the syntax of a function to get monadic behaviours.....and if you introduce the monadic behaviour of "state", to constrain the function types of your put/remove functions, then a similar trick would work, i.e. you annotate the function AND you annotate the call to the function to inject the necessary magic to post the (map/record) state through without the programmer seeing it.

They could have written LINQ operators to do it...a more general magic, but they chose not to.

Its a vague, hand wavy sort of analogy to how someone else has done it.

MarkNicholls commented 8 months ago

A possible way forward might be:

A map MAY have a type annotation, which is an item type (typically a record type or map type). If a map M has a type annotation T, then M instance of T is guaranteed to be true. In addition, if a map M has a type annotation T, then the result of map:put() or map:remove() applied to M must also be an instance of T (otherwise the remove or put operation fails with a type error).

A record constructor (details TBA) always constructs a map having a type annotation with the corresponding record type.

In addition, maps with type annotations may be created using the constructs

map (K, V) {key:value, key:value, ...}

for example map(xs:integer, xs:string){1:'yes', 2:'no'}

or in XSLT

<xsl:map as="T">...</xsl:map>

A further tweak: if a map has a type annotation with key type K, then the keys are effectively upcast to type K -- the map is no longer obliged to retain the actual subtype of each key value. (Retaining the subtype of each key value imposes an overhead which hardly ever has any value for users.)

what i find awkward about this is map:put for example is just a function, its not a method, so asking the function to fail for some parameters irks me....if it were a method and subservient to the map then that makes sense, it all feels DBC, but it isnt, so it feels weird.

isnt it just easier to say,

a) you can add remove things from maps b) you can't add remove things from records (or if you do you get a map) c) you can amend things in records (i.e. values, not keys) and you get a new record d) you can map, maps to records and records to maps

then havent you got all you need (again, not fully comprehending the motivation), and it all seems quite neat and tidy.

(or are you saying you want map:put etc to be a state monad? but then whats the type of map:put?...or is it secretly a state monad? can I, as a programmer write such a function?)

dnovatchev commented 8 months ago

A map MAY have a type annotation, which is an item type (typically a record type or map type). If a map M has a type annotation T, then M instance of T is guaranteed to be true.

Or, to put it clearly and precisely, we are adding type as an object. The details how this is done (using annotations or in another way) are irrelevant.

Reminds me of a proposal I made quite some time ago ...

Or, shall we say: Everything is a record, and a map is a record with 0 fixed keys...

ChristianGruen commented 8 months ago

b) you can't add remove things from records (or if you do you get a map)

@MarkNicholls Note that records are nothing else than maps that match one or more given record types. As a result, functions like map:put cannot differentiate between maps and records. In addition, the current rules allow records to have optional entries.

ChristianGruen commented 8 months ago

A map MAY have a type annotation, which is an item type (typically a record type or map type). […] if a map M has a type annotation T, then the result of map:put() or map:remove() applied to M must also be an instance of T (otherwise the remove or put operation fails with a type error).

@michaelhkay I believe we should exclude map types. Otherwise, we would indeed introduce a backward incompatibility.

michaelhkay commented 8 months ago

It's not a backwards incompatibility if it only applies to maps that have a type annotation, since existing maps don't have one.

ChristianGruen commented 8 months ago

It's not a backwards incompatibility if it only applies to maps that have a type annotation, since existing maps don't have one.

I see. I first assumed that type annotations will also be assigned to a map implicitly, via coercion…

let $x as map(xs:string, xs:string) := map { 'a': 'A' }

…but if we only do this via the proposed map (K, V) {} and record {} syntax, it’s fine.

MarkNicholls commented 8 months ago

@ChristianGruen

@MarkNicholls Thanks for the feedback. I invite you to check out the current XPath 4.0 draft to learn more about records, record types and record tests: https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-record-test

and i think this is where i get concerned, I've looked at the spec, its not one of my strength reading specs.

for me a map and a record are disjoint (like xs:integer and xs:double) and I personally would specify and implement them independently except for the ability to convert conveniently from one to the other and back.

You can then define your functions to have signatures that preserve what you need to preserve in each case and exclude functions that make no sense in each case. Then i dont think any magic is required.

I sense this is quite out of step with where the underlying proposal is, so I'll shut up.

ChristianGruen commented 8 months ago

for me a map and a record are disjoint (like xs:integer and xs:double) and I personally would specify and implement them independently […]

This would also be my preferred option (related: https://github.com/qt4cg/qtspecs/pull/916#issuecomment-1867517931). As you’ve already noted, it's probably out of scope if we want to finalize 4.0 in any foreseeable timeframe.

MarkNicholls commented 8 months ago

@ChristianGruen

I agree with your comment.

michaelhkay commented 1 month ago

This was a fascinating discussion, and I think it influenced our thinking in a number of areas, but I think it can be closed. If there are concrete changes coming out of it, they would be better handled in a new, more focussed issue.

One thing that this discussion influenced was the Saxon implementation: see https://blog.saxonica.com/mike/2024/08/maps-and-records.html

ndw commented 1 week ago

The CG agreed to close this issue without any further action at meeting 088.

qt4cg / qtspecs

Better support for typed maps #917