qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
29 stars 15 forks source link

Sorted maps #564

Open michaelhkay opened 1 year ago

michaelhkay commented 1 year ago

Based on a requirement from Michael Müller-Hillebrand on xsl-list.

We could define a variant of map:build() that constructs a sorted map. If a map is sorted, then:

michaelhkay commented 1 month ago

It might be simpler to do this if we avoid any explicit data model changes.

michaelhkay commented 6 days ago

Sketch of a proposal:

  1. Data model change: maps have a property "ordering" with (initially) three values: random, sorted, or fifo. Random is the default, corresponding to current behaviour. For sorted and fifo maps, the entries in the map have an ordering, which is respected by operations such as map:keys(), map:for-each(), JSON serialisation, and for key $k value $v.
  2. A FIFO map is created by a new variant of the map constructor, perhaps like ordered-map { "a":1, "b":2 }.
  3. A sorted map is created by a new function map:sort($map).
  4. Operations such as map:put() and map:remove() create a new map with the same ordering property as the input map. For a FIFO map, map:put() behaves as follows: if the key already exists, then the entry is left in the same position (if that's not what you want, use map:remove() first). Otherwise, the new entry is added at the end.
  5. map:build(), map:of-pairs(), map:merge() acquire an option to create a map with ordering FIFO that retains the order of the input.
  6. Some operations such as parse-json() and elements-to-maps() are defined to create FIFO maps that retain the order of the input (primarily so that it is retained on serialization).
  7. Additional operations might be defined on sorted maps, like getting all the entries within a range. Or better, define such operations on all maps, but they can have an optimised implementation if the map is sorted.
Arithmeticus commented 6 days ago

My quick reaction to a very intriguing idea, especially #7: I would like to be able to write something like $my-map('banana' through 'orange') or $my-map('banana' until 'orange') (yes, I know that syntax has problems).

I could swear we had introduced a comparable functionality for nodes in a tree, one that allowed one to get preceding or following siblings of a context up until or through some other given node, but for the life of me, I can't find description of said functionality in the qt4cg specs for FO or XPath. Was I dreaming? (Yes, I know we can use [and I do use] the << and >> operators within predicate expressions, but these IMO are not as well known as they should be.)

ChristianGruen commented 2 days ago

Please see https://github.com/qt4cg/qtspecs/pull/1609#issuecomment-2503675096 on my general doubts for the necessity of a custom map implementation for sorted entries.

ChristianGruen commented 2 days ago

Before adding sorted maps to the spec, I would love if we could collect some use cases, and compare how we would solve them with the 3.1 language features.

@michaelhkay has already listed some examples in https://github.com/qt4cg/qtspecs/pull/1609#issuecomment-2504070233:

It's sufficiently long ago that I forget the details, but it was a publishing requirement along the lines of dividing an encyclopedia into sections covering alphabetic ranges. xsl:key gives you an index that allows you to find specific entries by key, but it doesn't allow you to find all the keys in a given range.

I also recollect a customer using this for ranges of dates: find all the appointments in a given week (or find the next appointment after a given date/time). The only other way to achieve that is with a serial search.

ChristianGruen commented 2 days ago

Here are attempts to solve the mentioned use cases with unordered maps. Please note that I’m not trying to prove that it’s possible (this should be obvious). I would rather like to explore how intuitive or complex solutions without sorted maps turn out to be:

For a given map…

(: let $input := unparsed-text-lines('https://raw.githubusercontent.com/dwyl/english-words/refs/heads/master/words.txt') :)
let $input := ('before', 'a', 'and', 'an', 'hello', 'b')
let $words := map:merge($input ! map:entry(., string-length(.)))

…the entries could be split into sorted subsections, starting with the same letter:

(: XPath :)
for $group in partition(
  sort(map:keys($words)),
  fn($partition, $next) { not(substring(foot($partition), 1, 1) = substring($next, 1, 1)) }
)
return ($group?* ! (. || ': ' || $words(.)), '')

(: XQuery: grouping :)
for $keys in sort(map:keys($words))
group by $char := substring($keys, 1, 1)
return ($keys ! (. || ': ' || $words(.)), '')

(: XQuery: window :)
for tumbling window $group in sort(map:keys($words))
end $e next $n when substring($e, 1, 1) != substring($n, 1, 1)
return ($group ! (. || ': ' || $words(.)), '')

The next example selects appointments in a given date range:

let $map := map:build(
  (0 to 1000),
  fn { xs:date('2025-01-01') + seconds(. * 86400) },
  fn { format-integer(., 'w') }
)

(: map:keys :)
for $date in sort(map:keys($map))
where $date >= xs:date('2025-03-04') and $date <  xs:date('2025-04-04')
return $date || ': ' || $map($date)

(: map:entries :)
for $entry in sort(map:entries($map), (), map:keys#1)
for key $date value $info in $entry
where $date >= xs:date('2025-03-04') and $date <  xs:date('2025-04-04')
return $date || ': ' || $info

(: map:entries, if we allowed the input of `for key` to be a sequence of maps :)
for key $date value $info in sort(map:entries($map), (), map:keys#1)
where $date >= xs:date('2025-03-04') and $date <  xs:date('2025-04-04')
return $date || ': ' || $info

(: map:pairs :)
sort(map:pairs($map), (), fn { ?key })
  [?key >= xs:date('2025-03-04') and ?key <  xs:date('2025-03-11')]
  ! (?key || ': ' || ?value)