qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Toward a design for generators #708

Open rhdunn opened 1 year ago

rhdunn commented 1 year ago

Motivation

The motivations for this are to explore the creation of sequences where the next item in the sequence is determined by evaluating a function on the current state of a system. These sequence generators have an initial starting value and state. They can also stop when some condition is met.

The motivating example here is the fn:random-number-generator function, where:

  1. let $rnd := fn:random-number-generator() initializes a new random number sequence with the default seed as its state;
  2. let $value := $rnd?number returns the current value of the sequence;
  3. let $rnd := $rnd?next() returns the state and value of the next item in the sequence.

This has all the properties needed for a forward (left-to-right) generating sequence. To make it a generalized generator sequence, the number field should be renamed value.

NOTE: The fn:random-number-generator function defines an infinite sequence as it has no termination/end of sequence condition.

Sequence Generators as Record Types

Therefore, a forward generator sequence could look like this:

declare item-type sequence-generator as record(
    value as item(),
    next as function() as record(value, next, *)?,
    *
);

declare function generated-sequence() as sequence-generator?;

If calling ?next() on a sequence above returns the empty sequence then there are no more values in that sequence.

If the generated-sequence() returns the empty sequence then there are no items in the sequence.

NOTE: This does not currently define reversible sequence support. A reversed property could be provided that returns a sequence-generator that operates on the sequence from right to left. I haven't figured out exactly how this should look, but reversed generator sequences may be better investigated as a separate issue.

NOTE: An implementation can process this sequence iteratively if needed.

Analysis

While this allows generator sequence types to be created, they are -- like fn:random-number-generator() cumbersome to use. This does have the advantage of being backward compatible with fn:random-number-generator(), though.

The problem comes when trying to make these work like sequences. The subtype and function coercion rules should be doable. The other sequence operations like filtering are more complicated to define properly.

However, this has the same issues that allowing array() in fn:* functions has -- how do you differentiate the use cases where the user is working on a sequence of generators, or the generated sequences?

NOTE: You can't extend the functions to take a (sequence-type | item()*) parameter as a sequence-type is a subtype of item() which matches item()*. If the functions were to be extended to handle these, then fn:head(fn:random-number-generator()) would return a number instead of a map.

Sequence Generator Function

The Kotlin language has a generateSequence function that takes a next function, an optional seed value or construction function, and returns a lazy sequence over that. -- Internally, it is building a Java iterator that produces values from calling the next function. The sequence will terminate when the next value is null.

I propose that -- in addition to the sequence-generator type above -- XPath defines the following function:

declare function fn:sequence($generator as sequence-generator) as item()*;

This solves the issues in the Analysis section above, and is analogous to array:values. It is implementation defined how the sequence is constructed. -- This allows an implementer to appropriately map the generator to their internal sequence implementation in order to provide lazy evaluation and other operations.

There should also be the following helper function for random numbers:

declare function fn:random-numbers(
    $seed as xs:anyAtomicType? := ()
) as xs:double* {
    fn:random-number-generator() => fn:sequence()
};

The user-defined sequences then become e.g.:

let $generator as sequence-generator := map {
    value : 1,
    next : function () { () }
}
return fn:sequence($generator)
ChristianGruen commented 1 year ago

Nice summary, thank you.

Such a generator could easily be used in iterative loops or recursive function calls. The following query attaches a result item to the generator:

iterate-while(
  map:put(create-generator(), 'result', 0),
  function($generator) {
    exists($generator?value) and $generator?result < 10000
  },
  function($generator) {
    let $next := $generator?next()
    return map:put($next, 'result', $generator?result + $generator?value)
  } 
)?sum

In my example, I expect the generator to always return a map, as the intermediate result would otherwise get lost. Maybe we could also have a more entry or function to tackle that?

iterate-while(
  $generator,
  fn { ?more },
  fn { ?next() => map:put('seq', (?seq, ?value)) } 
)?seq

It would also be nice if enriched map entries were implicitly inherited to the next generator.

Regarding fn:sequences, if we had sequences with variable size, we could also allow infinity in range expressions (1 to xs:double('INF'), etc.) and many other things. But that feels like a lot of effort, and we should certainly define some serious use cases before continuing that.

dnovatchev commented 1 year ago

I find many desirable properties of generators missing in this otherwise nice attempt.

Therefore, I will be submitting a separate proposal for generators, that fills these voids.

One of course, is what @ChristianGruen duly noticed -- in the general case a generator needs a state (map) in order to continue its work.

rhdunn commented 1 year ago

@dnovatchev It would be helpful if you could list the desirable generator properties that are missing.

Note that the following works without having to keep state in the map:

xquery version "3.1";

(: https://en.wikipedia.org/wiki/Linear_congruential_generator :)
declare function local:lcg($seed, $a, $b, $m) {
    let $value := ($a * $seed + $b) mod $m
    return map {
        "value": $value,
        "next": function () {
            local:lcg($value, $a, $b, $m)
        }
    }
};

(: ZX81 random number generator LCG values :)
let $r1 := local:lcg(100, 75, 74, math:pow(2, 16) + 1)
let $r2 := $r1?next()
let $r3 := $r2?next()
let $r4 := $r3?next()
let $r5 := $r4?next()
return ($r1?value, $r2?value, $r3?value, $r4?value, $r5?value)

I've not tried this for @ChristianGruen's example. However:

  1. It should be possible to have a more as xs:boolean? or has-next as xs:boolean? field on the record -- if missing/empty it would be true. Note: if this is required, fn:random-number-generator would need to be further modified to add this field.
  2. It would ideally be possible to define next as next as function(sequence-generator? := ()) as sequence-generator. [1] -- Note: this takes an optional self value in order to be backward compatible with random-number-generator.

[1] The current specification only allows default arguments on function declarations, so this does not currently work. It would need to be defined at least for inline function expressions -- allowing function ($self as sequence-generator? := ()) { ... }. Allowing them on function tests as well would allow the record test for this to be accurately specified.

ChristianGruen commented 10 months ago

https://github.com/qt4cg/qtspecs/issues/716#issuecomment-1812146751 @rhdunn The discussion could be continued here (if you don’t mind… Otherwise, feel free to open a new issue):

With my fn:sequence proposal, you could write, e.g.:

let $gen := fn:random-number-generator() (: or other lazy sequence :)
for $value in fn:sequence($gen)[1 to 10]
return math:sin($value)

I wonder if we can make this work. The following variant would certainly be much easier to achieve:

let $gen := fn:random-number-generator() (: or other lazy sequence :)
for $value in fn:sequence($gen, 1, 10)
return math:sin($value)
rhdunn commented 10 hours ago

The way I see this working in practice for an implementer is that they would implement the iteration model/interface in the language the processor is running on:

  1. Iterator/Iterable in Java;
  2. Iterator/Sequence in Kotlin;
  3. Iterator/Seq in Scala;
  4. IEnumerator/IEnumerable in C#;
  5. std::iterator/std::range in C++;
  6. __iter__/__next__ in Python;
  7. etc.

The implementer would then adapt that to their model of an XDM sequence in the fn:sequence function implementation.

rhdunn commented 9 hours ago

Thinking about this some more, the iterator model is too klunky. Adopting something similar to generateSequence would be a lot cleaner. I.e.:

declare function fn:sequence($next as function(item()*) as item()*, $seed as item()*) as item()*;

where returning an empty sequence would stop the sequence.

Example: Random Numbers

declare function fn:random-numbers($seed as xs:anyAtomicType? := ()) {
  fn:sequence(fn { .?next() }, fn:random-number-generator($seed)) => fn { .?number }
};

Example: Fibonacci

declare function fn:fibonacci($first := 0, $second := 1) {
  fn:sequence(fn { (.[2], .[1] + .[2] ) }, ($first, $second)) => fn { .[1] }
};

Note: This is adapted from the example in Kotlin's generateSequence docs.