qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Maps with Infinite Number of Keys: Total Maps and Decorated maps #105

Open dnovatchev opened 2 years ago

dnovatchev commented 2 years ago

Maps with Infinite Number of Keys: Total Maps and Decorated maps

1. Total Maps

Maps have become one of the most useful tools for creating readable, short and efficient XPath code. However, a significant limitation of this datatype is that a map can have only a finite number of keys. In many cases we might want to implement a map that can have more than a fixed, finite number of arguments.

Here is a typical example (Example 1):
A hotel charges per night differently, depending on how long the customer has been staying. For the first night the price is \$100, for the second \$90, for the third \$80 and for every night after the third \$75. We can immediately try to express this pricing data as a map, like this:

map {
1 : 100,
2 : 90,
3 : 80
(:  ??? How to express the price for all eventual next nights? :)
}

We could, if we had a special key, something like "TheRest", which means any other key-value, which is not one of the already specified key-values.

Here comes the first part of this proposal:

  1. We introduce a special key value, which, when specified in a map means: any possible key, different from the other keys, specified for the map. For this purpose we use the string: "\"

Adding such a "discard symbol" makes the map a total function on the set of any possible XPath atomic items.

Now we can easily express the hotel-price data as a map:

map {
1 : 100,
2 : 90,
3 : 80
'\' : 75
}

Another useful Example (2) is that now we can express any XPath item, or sequence of items as a map. Let's do this for a simple constant, like π:

let $π := map {
'\' : math:pi()
}
 return $π?*   (: produces 3.141592653589793  :)

the map above is empty (has no regular keys) and specifies that for any other key-value $k it holds that $π($k) eq math:pi()

Going further, we can express even the empty sequence (Example 3) as the following map:

let $Φ := map {
'\' : ()
}
 return $Φ?*   (: produces the empty sequence :)

Using this representation of the empty sequence, we can provide a solution for the "Forgiveness problem" raised by Jarno Jelovirta in the XML.Com #general channel in March 2021:

This expression will raise an error:

[map {"k0": 1}, map{"k0": [1, 2, 3]}]?*?("k0")?*

[XPTY0004] Input of lookup operator must be map or array: 1.

To prevent ("forgive", thus "Forgiveness Problem") the raising of such errors we could accept the rule that in XPath 4.0 any expression that evaluates to something different than a map or an array, could be coerced to the following map, which returns the empty sequence as the corresponding value for any key requested in a lookup:

map {
'\' : ()
}  (: produces the empty sequence  for any lookup:)

To summarize, what we have achieved so far:

  1. The map constructed in Example 1 is now a total function over the domain of all natural numbers. Any map with a "\" (discard key) is a total function over the value-space of all xs:anyAtomicType values
  2. We can represent any XPath 4.0 item or sequence in an easy and intuitive way as a map.
  3. It is now straight-forward to solve the "Forgiveness Problem" by introducing the natural and intuitive rule for coercing any non-map value to the empty map, and this allows to use anywhere the lookup operator ? without raising an error.

2. Decorated Maps

Although we already achieved a lot in the first part, there are still use-cases for which we don't have an adequate map solution:

  1. In the example (1) of expressing the hotel prices, we probably shouldn't get $75 for a key such as -1 or even "blah-blah-blah" But the XPath 4.0 language specification allows any atomic values to be possible keys and thus to be the argument to the map:get() function. If we want validation for the actually-allowed key-values for a specific given map, we need to have additional processing/functionality.

  2. With a discard symbol we can express only one infinite set of possible keys and group them under the same corresponding value. However, there are problems, the data for which needs several infinite sets of key-values to be projected onto different values. Here is one such problem:

Imagine we are the organizers of a very simple lottery, selling many millions of tickets, identified by their number, which is a unique natural number.

We want to grant prizes with this simple strategy.

None of the sets of key-values for each of the 6 categories above can be conveniently expressed with the map that we have so far, although we have merely 6 different cases!

How can we solve this kind of problem still using maps?

Decorators to the rescue...

What is decorator, what is the decorator pattern and when it is good to use one? According to Wikipedia:

What solution does it describe? Define Decorator objects that

  • implement the interface of the extended (decorated) object (Component) transparently by forwarding all requests to it
  • perform additional functionality before/after forwarding a request.

This allows working with different Decorator objects to extend the functionality of an object dynamically at run-time.

The idea is to couple a map with a function (the decorator) which can perform any needed preprocessing, such as validation or projection of a supplied value onto one of a predefined small set of values (that are the actual keys of the map). For simplicity, we are not discussing post-processing here, though this can also be part of a decorator, if needed.

Let us see how a decorated-map solution to the lottery problem looks like:

let $prize-table := map {
  "ten" : 10,
  "hundred" : 20,
  "thousand" : 100,
  "five-thousand" : 1000,
  "prime" : 25000,
 "\" : 0
},
$isPrime := function($input as  xs:integer) as xs:boolean
{
  exists(index-of((2, 3, 5, 7, 11, 13, 17, 19, 23), $input)) (: simplified primality checker :)
},
$decorated-map := function($base-map as map(*), $input as xs:anyAtomicType) as item()*
{
  let $raw-result :=
         (
          let $key := 
           if(not($input castable as xs:positiveInteger)) then '\'  (: we can call the error() function here :) 
             else if($input mod 5000 eq 0) then 'five-thousand'
             else if($input mod 1000 eq 0) then 'thousand'
             else if($input mod 100 eq 0) then 'hundred'
             else if($input mod 10 eq 0) then 'ten'
             else if($isPrime($input)) then 'prime'
             else "\"
          return $base-map($key)
         ),
      $post-process := function($x) {$x},  (: using identity here for simplicity :)
      $post-processed := $post-process($raw-result)
    return $post-processed
},

$prizeForTicket := $decorated-map($prize-table, ?),       (: Note: this is exactly the lookup operator  ?    :)
$ticketNumbers := (1, 10, 100, 1000, 5000, 19, -3, "blah-blah-blah")

return $ticketNumbers ! $prizeForTicket(.)          (: produces 0, 10, 20, 100, 1000, 25000, 0, 0 :)

Conclusion

  1. In the 2nd part of this proposal, a new type/function -- the decorated-map was described.

  2. We defined the signature of a decorated-map and gave an example how to construct and use one in solving a specific problem. In particular, the proposal is to have a standard function:

    decorated-map ($base-map as map(*), $input as xs:anyAtomicType) as item()*

  3. Finally, we showed that the lookup operator ? on a decorated map $dm is identical to and should be defined as :

    $dm($base-map, ?)

What remains to be done?

The topic of decorators is extremely important, as a decorator may and should be possible to be defined on any function, not just on maps. This would be addressed in one or more new proposals. Stay tuned 😊

michaelhkay commented 2 years ago

Are there other languages that offer anything similar?

dnovatchev commented 2 years ago

Are there other languages that offer anything similar?

Yes:

  1. Total maps: see for example Coq

  2. Decorators: present in many programming languages with different names:

    And @michaelhkay , here is an excellent introduction (a module from the Pluralsight course: "Python: Beyond the Basics"):

    "Closures and Decorators"

michaelhkay commented 2 years ago

The essence of maps as currently defined is their simplicity, and I don't want to lose that.

I'm wondering if the most common use-case can be met by providing convenient ways to turn a map into a function:

map:with-default($map, $default) returns function($x){ if (map:contains($map, $x)) then $map($x) else $default }

Plus an extension to the semantics of the lookup operator '?' so that the LHS can be an arity-1 function. (But the RHS is atomised, so I think to maintain consistency with the current semantics it would have to be a function whose argument type is singleton atomic.)

I'm also wondering about a way of deriving a function from a map in such a way that the function is more strongly typed, but I'm finding it hard to come up with anything better than just writing a function that wraps the map explicitly:

function($x as xs:integer) as xs:string* {map:with-default($map, "")?$x}

dnovatchev commented 2 years ago

The essence of maps as currently defined is their simplicity, and I don't want to lose that.

I'm wondering if the most common use-case can be met by providing convenient ways to turn a map into a function:

map:with-default($map, $default) returns function($x){ if (map:contains($map, $x)) then $map($x) else $default }

Using an explicit "default" symbol seems more simple and visually intuitive, like in the below:

let $prize-table := map {
  "ten" : 10,
  "hundred" : 20,
  "thousand" : 100,
  "five-thousand" : 1000,
  "prime" : 25000,
 "\" : 0
}

We already have too-many map:xxx() functions and many people find that the ? operator has too-many different roles.

But certainly a function like this is a possibility. I would prefer it to be named more like:

map:add-default(...)

This said, I prefer even more to have a symbol for a default key. In the proposal this is "\". I understand that someone could argue that there might be cases when we would need this specific value to be used just as a regular key.

To eliminate the possibility of such an issue, we could establish a new fn:default() that can be used only as the default-key and only within a map as a key. Thus the above would become:

let $prize-table := map {
  "ten" : 10,
  "hundred" : 20,
  "thousand" : 100,
  "five-thousand" : 1000,
  "prime" : 25000,
 default() : 0
}
michaelhkay commented 2 years ago

If we were to do that, I would go for a keyword such as #default.

However, I'm strongly inclined towards having a self-imposed rule for this round: no data model changes.

dnovatchev commented 2 years ago

If we were to do that, I would go for a keyword such as #default.

However, I'm strongly inclined towards having a self-imposed rule for this round: no data model changes.

This is exactly why instead of a new special keyword #default we can use the proposed function default() . This is just a function (constant, like the existing math:pi() ) and just adding a function doesn't change the data model.

ChristianGruen commented 2 years ago

I don't believe we should add a special-default-case solution just for maps. There are so many other expressions that could benefit from, but don't have a default branch.

I would rather recommend users to take advantage of the upcoming otherwise expression (or ?: as alternative writing, as currently realized in BaseX).

pgfearo commented 2 years ago

I like the @michaelhkay proposal because, if I understand correctly, the following syntax seems quite neat:

let $prize-table := map {
  "ten" : 10,
  "hundred" : 20,
  "thousand" : 100,
  "five-thousand" : 1000,
  "prime" : 25000
} => map:with-default(0)

Regarding the ? lookup operator - given that we use * on the RHS to return all map values, perhaps we could use + on the RHS to return the default value, giving:

let $default := $prize-table?+
dnovatchev commented 2 years ago

I think using the function with-default() as proposed by @michaelhkay makes sense and it seems quite visually appealing and intuitive if written in combination with the arrow operator:

let $myMap := map{"x" : 1, "y" : 2} => with-default( 0 )

It would be more visually appealing if this function could be called without specifying a prefix, so why not define it in the standard function namespace?

dnovatchev commented 2 years ago

Plus an extension to the semantics of the lookup operator '?' so that the LHS can be an arity-1 function. (But the RHS is atomised, so I think to maintain consistency with the current semantics it would have to be a function whose argument type is singleton atomic.)

I believe this was already proposed in a separate proposal by @ChristianGruen : [XPath] Generalize lookup operator for function items

dnovatchev commented 2 years ago

The essence of maps as currently defined is their simplicity, and I don't want to lose that.

I'm wondering if the most common use-case can be met by providing convenient ways to turn a map into a function:

map:with-default($map, $default) returns function($x){ if (map:contains($map, $x)) then $map($x) else $default }

@michaelhkay After careful consideration, I am satisfied with the function so proposed. Just for readability I would prefer this to be in the standar fn; namespace, thus instead of having to write:

$someMap => map:withDefault(5)

we can omit any prefix and its namespace binding and simply write:

$someMap => withDefault(5)

Plus an extension to the semantics of the lookup operator '?' so that the LHS can be an arity-1 function. (But the RHS is atomised, so I think to maintain consistency with the current semantics it would have to be a function whose argument type is singleton atomic.)

We already have a similar proposal by @ChristianGruen : [XPath] Generalize lookup operator for function items

Also, in the starting proposal, we showed that the lookup operator ? for any decorated map $dm (and a total map is a special case of a decorated map) is identical to, and should be defined as:

$dm($base-map, ?)

I'm also wondering about a way of deriving a function from a map in such a way that the function is more strongly typed, but I'm finding it hard to come up with anything better than just writing a function that wraps the map explicitly:

function($x as xs:integer) as xs:string* {map:with-default($map, "")?$x}

Shouldn't this be:

function($x as xs:integer) as xs:string* {map:with-default($map, () )?$x}

dnovatchev commented 2 years ago

I'm wondering if the most common use-case can be met by providing convenient ways to turn a map into a function:

map:with-default($map, $default) returns function($x){ if (map:contains($map, $x)) then $map($x) else $default }

@michaelhkay, In a previous comment I expressed approval for this idea. 💯

Please note however, that the users will certainly want to treat the result of the with-default as a map and not as just as a function.

Therefore, I would prefer the definition and implementation of the function as in the following expression:

let $withDefault := function($map as map(*), $default as item()*) as map(*)
{
   let $extenderKey := xs:float("-INF"),
       $props := if(map:contains($map, $extenderKey))
                    then  map:put($map($extenderKey), "$default", $default)
                    else map{ "$default" : $default }
    return
      map:put($map, $extenderKey, $props) 
}
  return
    (
      $withDefault(map{"x" : 2, "y" : 3}, 10),
      "==============",
       $withDefault($withDefault(map{"x" : 2, "y" : 3}, 10), 15)
     )

Evaluating this correctly produces:

map {
  xs:float("-INF"): map {
    "$default": 10
  },
  "x": 2,
  "y": 3
}
==============
map {
  xs:float("-INF"): map {
    "$default": 15
  },
  "x": 2,
  "y": 3
}

We will modify the behavior of the map:XXX functions not to deal with the $extenderKey and its values, except in the cases when $extenderKey is specifically provided as argument.

One could argue what specific value we should choose for the key $extenderKey. We can agree on any newly-generated GUID, or any other suitable value.

Different programming languages handle this in different ways. For example, in Python double-underscore mangling is the way to maintain distinct names, as explained here

ChristianGruen commented 2 years ago

Once again I’d like to draw the attention to the new otherwise expression, which I believe is a much more generic solution to lookups (not just in maps) that don't yield results. It's available in many other languages as well (https://en.wikipedia.org/wiki/Null_coalescing_operator). Do we really need and want a solution specific to maps? What about arrays and sequences, and what about lookups that return an empty sequence as value?

dnovatchev commented 2 years ago

Do we really need and want a solution specific to maps? What about arrays and sequences, and what about lookups that return an empty sequence as value?

@ChristianGruen This is a very good question!

I completely agree that if we just wanted to provide a default-value capability to a map, there could be ways to achieve this with a null-coalescing function/operator.

However, this issue is about maps and ways to extend a map - not only with a default value, but potentially with new capabilities, that are not planned in advance. For example, a decorated map, that knows about the sequence of all decorators that have been applied to itself. Obviously the null-coalescing cannot help to achieve this, but placing all new-capabilities-related data within a nested map - can.

Addressing the concern that focusing to "just maps" is something not so generic, let us remember that maps in XPath are analogous to objects in the OOP world. Is there something even more generic than objects?

ChristianGruen commented 1 year ago

Copied from https://github.com/qt4cg/qtspecs/issues/707#issuecomment-1722299446 (I think it’s better to discuss it here):

It is easy to see that this current proposal has a few shortcomings (the last one critical):

  • In many cases we may want just an error to be raised, thus returning the empty sequence will need more coding and checking (as above)

You could replace the default/fallback value with a function that creates the result (similar to https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-map-get);

dnovatchev commented 1 year ago

Copied from #707 (comment) (I think it’s better to discuss it here):

It is easy to see that this current proposal has a few shortcomings (the last one critical):

  • In many cases we may want just an error to be raised, thus returning the empty sequence will need more coding and checking (as above)

You could replace the default/fallback value with a function that creates the result (similar to https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-map-get);

A _programmer's proverb_:

"Every problem can be solved with introducing one more level of indirection - except the problem of having too-many levels of indirection"

😄

ChristianGruen commented 2 months ago

I am not sure why it was labeled as "PRG-hard", while this is trivial to implement.

@dnovatchev I assume it was labeled as such because we will seem to have no one who will create a PR for it. If we had one, I am sure people (= at least me) would look at it.

dnovatchev commented 2 months ago

I am not sure why it was labeled as "PRG-hard", while this is trivial to implement.

@dnovatchev I assume it was labeled as such because we will seem to have no one who will create a PR for it. If we had one, I am sure people (= at least me) would look at it.

@ChristianGruen ,

At the time I wrote this proposal I had no logistics experience whatsoever in creating a PR for the specs repository.

If we want not to have to change the XDM, then we need to allow a single special key-value that will be used to specify missing-key default value and/or a map that can contain many other "system properties". This has to be a "special", non-atomic value, in order to avoid any possible collision with any key that is currently allowed and maybe used in existing code.

It is possible to change the XPath language specs and add text specifying that such a value can be used as a key.

The only problem is that we need to agree which exactly value to use. I would propose to use [ ] (the empty array), but any other special value would do.

Any suggestions from anyone, please?

ChristianGruen commented 2 months ago

At the time I wrote this proposal I had no logistics experience whatsoever in creating a PR for the specs repository.

Absolutely; that’s no judgement, just (as I understand it) the assessment how the labels were assigned.

I would propose to use [ ] (the empty array), but any other special value would do.

This would probably be equivalent to the empty sequence, as keys in maps are defined to be atomized.

dnovatchev commented 2 months ago

At the time I wrote this proposal I had no logistics experience whatsoever in creating a PR for the specs repository.

Absolutely; that’s no judgement, just (as I understand it) the assessment how the labels were assigned.

I would propose to use [ ] (the empty array), but any other special value would do.

This would probably be equivalent to the empty sequence, as keys in maps are defined to be atomized.

I am not sure if a key that is the empty sequence is not allowed at present. Even if this value is "free to use", it would be difficult and awkward to specify it in the definition of a map - at the same time there is no such difficulty in specifying [ ]

ChristianGruen commented 2 months ago

at the same time there is no such difficulty in specifying [ ]

I think it would be difficult for me, but I may be wrong. I'm looking forward to it.

michaelhkay commented 2 months ago

If we want not to have to change the XDM, then we need to allow a single special key-value that will be used to specify missing-key default value and/or a map that can contain many other "system properties". This has to be a "special", non-atomic value,

If you allow a map to contain a non-atomic key then you have changed the data model, and this impacts every operation currently defined on maps. That's why it's labelled "hard". But remember that the labels were assigned based on about 5 minutes study. What's hard for some people might be easy for others.

dnovatchev commented 2 months ago

If we want not to have to change the XDM, then we need to allow a single special key-value that will be used to specify missing-key default value and/or a map that can contain many other "system properties". This has to be a "special", non-atomic value,

If you allow a map to contain a non-atomic key then you have changed the data model, and this impacts every operation currently defined on maps. That's why it's labelled "hard". But remember that the labels were assigned based on about 5 minutes study. What's hard for some people might be easy for others.

If we allow the empty-sequence as a key value, then we are not changing the XDM, are we?

michaelhkay commented 2 months ago

If we allow the empty-sequence as a key value, then we are not changing the XDM, are we?

Of course we are. The XDM does not currently allow the empty sequence as a key value, and we would be changing that. Lots of decisions need to be made in consequence, for example how this affects map types and subtyping rules, how it affects functions such as map:get, map:put, map:pairs, map:keys and map:size, etc etc. When you get into the detail, some of these decisions will inevitably be non-trivial.