Open dnovatchev opened 1 year ago
Thanks, that looks like a very coherent proposal, and it's easy to understand as a generalisation of the design pattern used by fn:random-number-generator.
Looking at it again, though, I wonder what we're actually proposing adding to the spec? Is it a set of functions like skip() and take() plus a definition of the record type generator
that they operate on, plus some kind of user guide on how to implement your own generators?
In my own experience I've found that the easiest way to use fn:random-number-generator() is with xsl:iterate, though presumable fn:iterate-while() will also work well. Examples that show the interaction of these capabilities would be useful.
It would be good to see some examples that show how the concept can be applied to processing of XML and JSON. Perhaps there's some kind of possibility of a function like stream(uri-of-xml-document, simple-pattern)
that delivers a generator whose delivered items are subtrees of the XML document that satisfy the simple-pattern, where the simple-pattern is some kind of restricted predicate that can test nodes against local conditions like their name, attributes, and ancestry.
Note also issue #295, which points out the need for extensions to the record-type syntax to allow record types of the kind you are using, with self-reference in function arguments and result.
I guess two more useful functions on generators might be for-each()
and filter()
that return new generators.
It would seem appealing to make the functions skip()
, take()
, for-each()
, filter()
etc be accessible as fields of the generator itself, so we can write $my-generator?take(10). But since the generator is user-written, that requires some kind of inheritance mechanism so someone implementing a generator gets these methods "for free". Perhaps this is a bridge too far... If we provide free-standing functions like generator:take($generator, $count)
then it's easy enough for the user implementing a generator to wrap these as methods on the generator if they choose.
Another thought is whether and how one might allow the internal state of a generator to be private.
Thanks, that looks like a very coherent proposal, and it's easy to understand as a generalisation of the design pattern used by fn:random-number-generator.
Looking at it again, though, I wonder what we're actually proposing adding to the spec? Is it a set of functions like skip() and take() plus a definition of the record type
generator
that they operate on, plus some kind of user guide on how to implement your own generators?
This is the minimum we can do.
We could also think of extending the definition of some of the standard functions that operate on sequences and on arrays, to accept as first argument anything that implements head and tail. If this can be done, then we will not need to add new overloads for these functions at all.
In my own experience I've found that the easiest way to use fn:random-number-generator() is with xsl:iterate, though presumable fn:iterate-while() will also work well. Examples that show the interaction of these capabilities would be useful.
Exactly. A generator goes well as argument to any function that has an accumulator-like argument that will contain the latest generator, produced in the latest iteration (or latest recursive call), such as folds, _iterate-XXX_, _while-XXX_
And generally being passed as one of the arguments of a call to a recursive function, as showed in the examples in the proposal.
It would be good to see some examples that show how the concept can be applied to processing of XML and JSON. Perhaps there's some kind of possibility of a function like
stream(uri-of-xml-document, simple-pattern)
that delivers a generator whose delivered items are subtrees of the XML document that satisfy the simple-pattern, where the simple-pattern is some kind of restricted predicate that can test nodes against local conditions like their name, attributes, and ancestry.
Yes, I have a few examples in mind, but based on the feedback (including yours) from my previous/other issues, I wanted to keep this one as simple and short as possible.
Do note that a generator can encapsulate any kind of collection - not only a sequence or an array, but also maps, trees, XML documents, infinite or of unknown size collections, ... , you name it. The same goes to the traversal strategy that a generator can use Breadth-First, Dept-First, left-to-right, right-to-left, oscillating, ... , again you name it.
I guess two more useful functions on generators might be
for-each()
andfilter()
that return new generators.It would seem appealing to make the functions
skip()
,take()
,for-each()
,filter()
etc be accessible as fields of the generator itself, so we can write $my-generator?take(10). But since the generator is user-written, that requires some kind of inheritance mechanism so someone implementing a generator gets these methods "for free". Perhaps this is a bridge too far... If we provide free-standing functions likegenerator:take($generator, $count)
then it's easy enough for the user implementing a generator to wrap these as methods on the generator if they choose.
Yes. These can easily be implemented as standard (universally available) decorators (or just a single decorator adding all of these functions to the result-generator) on a generator, thus the author of a generator doesn't have to implement them at all.
Such a decorator takes a generator and returns another generator that is the result of adding to the original one the missing functions.
Another thought is whether and how one might allow the internal state of a generator to be private.
Right now there is no concept of private in XPath.
I guess two more useful functions on generators might be
for-each()
andfilter()
that return new generators.
Here are the two additional requested functions on generators: for-each and filter, plus two additional: for-each-pair and at.
I am also placing them into the main proposal:
at($ind as xs:integer)
{
subrange($ind, 1)
}
for-each($gen as generator, $fun as function(*))
{
map:put($gen, "getCurrent", function() { $fun($gen?getCurrent()) } )
}
for-each-pair($gen1 as generator, $gen2 as generator, $fun as function(*))
{
let $gen1 := if(not($gen1?initialized)) then $gen1?moveNext()
else $gen1,
$gen2 := if(not($gen2?initialized)) then $gen2?moveNext()
else $gen2,
return
if($gen1?endReached or $gen2?endReached) then map:put($gen1, "endReached", true())
else map:put(map:put($gen1, "getCurrent", function() { $fun($gen1?getCurrent(), $gen2?getCurrent()) } ) ,
"moveNext", function() { for-each-pair(skip($gen1, 1), skip($gen2, 1), $fun)}
)
}
filter($gen as generator, $pred as function(item()*) as xs:boolean)
{
let $gen := if(not($gen?initialized)) then $gen?moveNext()
else $gen,
$moveNext := function() { filter($gen?moveNext(), $pred) }
return if($gen?endReached or $pred($gen?getCurrent()))
then map:put($gen, "moveNext", $moveNext)
else filter($gen?moveNext(), $pred)
}
Update:
Here is a more precise, non-(explicitly)recursive code for filter acting on a generator.
Do note:
The code below is valid and executable XPath 4.0 code.
It demonstrates that fn:iterate-while can work flawlessly with generator-input, without needing any modifications or new overloads
The code creating the filtered-generator doesn't know (and doesn't use) anything about the internal state of the input generator.
The produced generator that performs the filtering and presents its results one at a time, has as its own, internal state the key named "inputGen"
and its value, which can be any suffix of the input-generator, such that the head of this suffix is satisfying the predicate $pred
.
filter($gen as generator, $pred as function(item()*) as xs:boolean)
{
let $getNextGoodValue := function($gen as map(*), $pred as function(item()*) as xs:boolean)
{
let $mapResult := iterate-while(
$gen,
function($gen) { not($pred($gen?getCurrent($gen))) },
function($gen) { $gen?moveNext($gen) }
)
return $mapResult?getCurrent($mapResult)
},
$gen := if($gen?initialized) then $gen
else $gen?moveNext($gen)
return
map {
"initialized": true(),
"endReached": $gen?endReached,
"getCurrent": function($this as map(*)) { $getNextGoodValue($this?inputGen, $pred) },
"moveNext": function($this as map(*))
{ let $nextGoodValue := $getNextGoodValue($this?inputGen?moveNext($this?inputGen), $pred),
$nextGen := iterate-while(
$this?inputGen?moveNext($this?inputGen),
function($gen) { not($pred($gen?getCurrent($gen))) },
function($gen) { $gen?moveNext($gen) }
)
return
map {
"initialized": $nextGen?initialized,
"endReached": $nextGen?endReached,
"getCurrent" : function($x) {$nextGoodValue},
"moveNext" : $this?moveNext,
"inputGen" : $nextGen
}
},
"inputGen" : $gen
}
}
Below is a complete XPath 4.0 expression which, besides the above definitions, produces a specific filtered generator, whose input is the generator of integers from 2 to Infinity, and the predicate "approves" any odd number. The first 999 members of this generator are materialized as the result of this expression:
let $gen2ToInf :=
map{
"initialized": true(),
"endReached": false(),
"getCurrent": function($this as map(*)) { $this?last + 1},
"moveNext": function($this as map(*))
{
if(not($this?initialized)) then map:put($this, "initialized", true())
else map:put($this, "last", $this?last +1)
},
"last" : 1
},
$getNextGoodValue := function($gen as map(*), $pred as function(item()*) as xs:boolean)
{
let $mapResult := iterate-while(
$gen,
function($gen) { not($pred($gen?getCurrent($gen))) },
function($gen) { $gen?moveNext($gen) }
)
return $mapResult?getCurrent($mapResult)
},
$filter := function($gen as map(*), $pred as function(item()*) as xs:boolean)
{
let $gen := if($gen?initialized) then $gen
else $gen?moveNext($gen)
return
map {
"initialized": true(),
"endReached": $gen?endReached,
"getCurrent": function($this as map(*)) { $getNextGoodValue($this?inputGen, $pred) },
"moveNext": function($this as map(*))
{ let $nextGoodValue := $getNextGoodValue($this?inputGen?moveNext($this?inputGen), $pred),
$nextGen := iterate-while(
$this?inputGen?moveNext($this?inputGen),
function($gen) { not($pred($gen?getCurrent($gen))) },
function($gen) { $gen?moveNext($gen) }
)
return
map {
"initialized": $nextGen?initialized,
"endReached": $nextGen?endReached,
"getCurrent" : function($x) {$nextGoodValue},
"moveNext" : $this?moveNext,
"inputGen" : $nextGen
}
},
"inputGen" : $gen
}
}
return
fold-left(1 to 999, [$filter($gen2ToInf, fn($x) {$x mod 2 eq 1} ), []],
fn($accum, $mem)
{
let $nextGen := $accum(1)?moveNext($accum(1))
return [$nextGen, array:append($accum(2), $nextGen?getCurrent($nextGen))]
}
)
(2)
Anyone can execute this code with BaseX ver. 10.8 and produce the result:
This is an open question to the knowledgeable reader:
Can we extend the type of arguments of any currently existing standard XPath functions (and operators) that are (the argument types) sequence, array or map, to a more generalized type that is the union type of these types and the generator type?
This will result in immediately having these standard functions become much more powerful and encompassing, without impacting backwards compatibility in any existing pre- XPath 4.0 code, and no need for additional overloads.
Can we extend...
Not if the argument type is sequence, because there is no type that i more general than sequence, so the union of sequence with any other type is sequence.
When we introduced arrays, we spent at least a year on trying to devise a solution in which sequences and arrays were subtypes of something more general. Sadly, it proved too difficult.
One way forward might be to use annotations as a sort of interface/prototype mechanism. We could say, for example, that if $X is annotated %prototype=map{"count" : array:size#1, "head" : array:head#1, ...)
then '%count($X)returns
array:size($X)and
%head($X)returns
array:head($X)`. But working out all the details would be challenging, and the final result might be quite confusing to users.
One way forward might be to use annotations as a sort of interface/prototype mechanism. We could say, for example, that if $X is annotated
%prototype=map{"count" : array:size#1, "head" : array:head#1, ...)
then '%count($X)returns
array:size($X)and
%head($X)returns
array:head($X)`. But working out all the details would be challenging, and the final result might be quite confusing to users.
I find annotations rather messy.
A better and well-tested in practice approach is to say that everything is an object. An object has a value and some other properties, like collection-type whose values specify what type of collection the object?value is, and this can be one of : map, array, generator, sequence .
This is in full accordance with https://github.com/qt4cg/qtspecs/issues/105 following which every value can be represented as a map that has no regular keys and only a default key whose value is the value. We can add a special key named object-properties to this map, which could be something that is not allowed as a regular key until XPath 4, and have all object properties (among them collection-type ) as properties of the map that is the value of the key object-properties.
Any object has a set of standard properties, and can have its own-properties.
Any value (item(*)
) can give us its containing object as the result of evaluating a new function:
object($value as item()*)
I know this sounds rather complicated.
For the purpose of the current discussion, we could have a simpler solution, where we have type-hints such as:
no-sequence(ExprProducingSingleItem)
telling the processor that the value of ExprProducingSingleItem
must not be regarded as a sequence.
In this way, legacy code that isn't aware of the type-hints will continue to treat ExprProducingSingleItem
generally as a sequence of one item.
But if an argument is passed with the no-sequence
hint, its collection-type would be one of the remaining known collection-types (array, map, generator)
For the purpose of the current discussion, we could have a simpler solution, where we have type-hints such as:
no-sequence(ExprProducingSingleItem)
telling the processor that the value of
ExprProducingSingleItem
must not be regarded as a sequence.In this way, legacy code that isn't aware of the type-hints will continue to treat
ExprProducingSingleItem
generally as a sequence of one item.But if an argument is passed with the
no-sequence
hint, its collection-type would be one of the remaining known collection-types (array, map, generator)
We could simply use as a hint surrounding an expression (that evaluates to a single, non-sequence-collectionItem (array, map or generator) by back-slashes:
\$myGenerator\
This means:
Dear processor, use $myGenerator
not as a sequence of one item, but as the generator of the input collection on which we want to apply the function.
Thus,
tail(\[1, 2, 3]\)
will produce:
[2, 3]
and will not produce
()
In this way we can have and use only one single set of collection-processing functions, regardless of the type of the collection - be it a sequence, an array, a map or a generator.
It seems that the proposed backslash-notation is not defining \[1, 2, 3]\
as an expression, but rather as a modifier of the syntax of a static function call which somehow changes the semantics of the function call, in effect causing the function name to bind to a different function with a different signature? This all seems very lacking in orthogonality.
If annotations are messy, this is downright disheveled.
It seems that the proposed backslash-notation is not defining
\[1, 2, 3]\
as an expression, but rather as a modifier of the syntax of a static function call which somehow changes the semantics of the function call, in effect causing the function name to bind to a different function with a different signature? This all seems very lacking in orthogonality.If annotations are messy, this is downright disheveled.
No, not different functions that have different signature but just one function.
Exactly the opposite: this is a single function that accepts a generator. If using \
is visually displeasing, let us have:
sequence-generator([1, 2, 3])
This presents the item [1, 2, 3]
as a generator, calling whose getCurrent()
method produces just one item (after moveNext()
is called) - the whole array.array-generator([1, 2, 3])
This presents the item [1, 2, 3]
as a generator, calling whose getCurrent()
method produces 1, 2, and 3 on each call (after moveNext()
is called).map-generator(map{"a": 1, "b": 2})
This presents the item map{"a": 1, "b": 2}
as a generator, calling whose getCurrent()
method produces each of the map's two key-value-pairs (after moveNext()
is called).$someGenerator
- this must be a generator.We can think of the three types: sequence-generator
, array-generator
and map-generator
as distinct (non-overlapping) subtypes of the generator
type.
Updated with Conclusion:
Thus, we can have just a single set of collection-processing functions, that can take as input any type of collection: be it a sequence, a singleton-array, a singleton-map, or any other collection represented by a generator.
Closely related (both issues should be discussed together): #708.
Closely related (both issues should be discussed together): #708.
These are not "both issues". This is the same issue, which in #708 is just mentioned and touched on the surface, while here we have done a deep analysis and a full design and proposal.
So, to be clear: there is one issue and there is one complete proposal.
So, to be clear: there is one issue and there is one complete proposal.
This wasn’t meant to be offensive, Dimitre.
Thanks to all for today's discussion.
To address @rhdunn 's statement that the boolean initialized
member is not necessary, and that no such property is there in the IEnumerator interface in .NET, here is a quote of the Microsoft's documentaion:
"Current is undefined under any of the following conditions:
Thus, it is clearly stated that immediately after the enumerator is created an (in fact initialization) action is needed (calling the MoveNext method) so that the enumerator will be able to expose a correct value of its Current property.
In the current proposal the initialized
member serves exactly the purpose of indicating whether or not a call to moveNext is necessary in order to ensure that a call to getCurrent can produce a correct result.
Thus, rather than mentioning this important fact as something minor in the Remarks, here we choose to have a property that explicitly shows the state of the generator -- being either initialized or non-initialized. Operating on a uninitialized generator can no longer "go unnoticed" or be blamed on the caller. Instead, initialization (calling moveNext) can always be done whenever initialized
is false.
That just means that in order to traverse an IEnumerator, you need to write:
IEnumerator<T> enumerator = createGenerator();
while (enumerator.MoveNext()) {
T current = enumerator.Current;
// ...
}
This is similar to how Java Iterator works, where you write:
Iterator<T> iterator = createGenerator();
while (iterator.hasNext()) {
T next = iterator.next();
// ...
}
Using Kotlin, you can implement a NodeList
Java Iterator as:
class NodeListIterator(val nodes: NodeList) : Iterator<Node> {
private var current: Int = 0
// Returns true if there are more elements. Does not advance the iterator.
override fun hasNext(): Boolean = current != nodes.length
// Returns the next element. Advances the iterator.
override fun next(): Node = nodes.item(current++)
}
In C# -- because of IEnumerable semantics -- this would be something like:
class NodeListIterator : IEnumerable {
NodeList nodes;
int current = -1; // Positioned before the first element.
public NodeListIterator(NodeList nodes) {
this.nodes = nodes;
}
// Move to the next item. Adances the iterator.
public bool MoveNext() {
return position++ < nodes.Length;
}
// Get the current item. Does not advance the iterator.
public Node Current {
get { return nodes.Item(current); }
}
}
C# has different semantics to Java because the Current
item is a property and is named Current
. This allows a user to access Current
multiple times in the function without moving the iterator. In Java you would need to place the result of next()
in a temporary variable to avoid issues with multiple access.
C# has different semantics to Java because the Current item is a property and is named Current. This allows a user to access Current multiple times in the function without moving the iterator. In Java you would need to place the result of next() in a temporary variable to avoid issues with multiple access.
The differences run deeper than this (which makes transpiling Java to C# painful!). In C#, MoveNext changes the state, while in Java, hasNext() must be stateless - you can call it multiple times before a call on next().
Thanks Mike and Reece!
Anyway, having an explicit initialization-state property is something helpful for all developers here and this is its intended goal.
I think it would be useful to try and write different generator functions that cover the different expected uses/patterns so we can work out what properties and associated semantics are needed. That way, we would know if an initialized property is useful or if it is not needed.
In short, a generator looks like a function but behaves like an iterator.”
Looking at this thread again, what's being described here looks much more like an iterator than a generator. To me the essential characteristic of a generator is that it's written like a coroutine; a function that yields result items one at a time.
In short, a generator looks like a function but behaves like an iterator.”
Looking at this thread again, what's being described here looks much more like an iterator than a generator. To me the essential characteristic of a generator is that it's written like a coroutine; a function that yields result items one at a time.
Yes, but not every kind of iterator.
We could think of a better name, still the main characteristics of the proposed record type is that it only produces a current value one at a time and doesn't need any materialized backing collection.
This is essentially a generator without requiring the traditional language support for yield return
and/or coroutines.
If a specific language exposed its class implementing a generator, this class would greatly resemble what is proposed here.
@michaelhkay Yes. The record interface defined here is an iterator. What we (Dimitre and I) are proposing is in effect lazy or dynamic sequences where the (possibly infinite) values are computed on demand.
If we just define the iterator-style record then these won't be immediately useful -- they would be something that could be implemented as an external library. For them to be useful, they need to be usable within XPath, XSLT, and XQuery expressions as sequences.
Thus, with my proposed fn:sequence
function in #708, they work like they also have a Java Iterable or Kotlin Sequence (which is used as the basis of Kotlin's Sequences machinery which is similar to XPath sequences).
But I wonder also whether we could find a way of writing iterators in generator/coroutine programming style, as a function which "yields" items one at a time? There's a relationship here with the work we are doing on asynchronous functions (for SaxonJS, primarily).
Of course you can write a function that just returns a sequence, and rely on lazy evaluation to deliver it incrementally. But there's a difference because we're talking here about sequences that only support incremental forwards processing.
We've got a lot of internal machinery for this kind of thing in Saxon. For example it's a convention that system functions only read supplied sequences once, in a forwards direction, which means the caller can supply an iterator; user-written functions however can do anything they like with supplied sequences, which means they have to be re-readable. (Except when we can see that's not needed, which only happens if a function is inlined.)
We also have the ability to execute functions in either pull or push mode, which means they are very like coroutines already.
I assume the usual way to work with generators will be to feed them into recursive functions, folds or iterate-while loops:
declare function local:iterate($gen, $n) {
if($n > 0) then (
local:iterate($gen?next(), $n - 1)
) else (
$gen
)
};
local:iterate($generator, $n)?value()
fold-left(
1 to 5,
$generator,
fn($gen) { $gen?next() }
)?value()
iterate-while(
$generator,
fn($gen, $pos) { $pos < 5 },
fn($gen) { $gen?next() }
)?value()
If we don’t expect the generator to collect the result, or if we are not only interested in the value that has been generated last, it could be:
declare function local:iterate($gen, $n) {
if($n > 0) then {
$gen?value(),
let $next-gen = $gen?next()
where exists($next-gen)
return local:iterate($next-gen, $n - 1)
}
};
local:iterate($generator, $n)
…which doesn’t look very intuitive, though.
Due to the immutable nature of generators (creating new instances with every next()
invocation), I can’t imagine how they could be applicable to classical FLWOR expressions, or user-defined functions accepting arbitrary sequences, even if an implementation processes its arguments lazily.
I think it would make sense to keep this issue about the record-based interface. The discussion on coroutines, yield, etc. should be a separate issue.
With my fn:sequence
proposal, you could write, e.g.:
let $gen := fn:random-number-generator() (: or other lazy sequence :)
for $value in fn:sequence($gen)[1 to 10]
return math:sin($value)
To add to my last comment:
If we don't like the name "generator" then a more precise name would be enumerator.
It has exactly the same functionality as a .NET enumerator, and doesn't need to have a storage-backed collection, it just produces the next value of the enumeration on each call.
Do note that "enumerable" is applicable not only to finite collections, but also to infinite ones such as 1 to Infinity - a well-known fact from Math that the set of natural numbers (or integers) is "enumerable".
This is why in .NET an IEnumerable
has a method ToList
and ToArray
)) (and many other ToXXX methods) if we want to get the entire collection in-memory.
This is why Visual Studio warns us when we use a method like Count
on an IEnumerable
that "Calling this method will enumerate the entire collection".
It must be obvious by now that the concept of Enumeration has an immense value and is at the heart of .NET and LINQ.
We can have the same Enumeration here in XPath, as specified in the current proposal.
Let's just do our job.
Certainly Java and C# get great value from the Iterator/Enumerator concept.
However, they are procedural stateful languages, and therefore have limited scope for lazy evaluation. Many of the benefits of iterators/enumerators (specifically, not having to materialise a collection in memory) can be achieved in XPath by using lazy evaluation. That doesn't mean there are no benefits in adding iterators or enumerators to XPath, it just means that they are needed much less often.
Certainly Java and C# get great value from the Iterator/Enumerator concept.
However, they are procedural stateful languages, and therefore have limited scope for lazy evaluation. Many of the benefits of iterators/enumerators (specifically, not having to materialise a collection in memory) can be achieved in XPath by using lazy evaluation. That doesn't mean there are no benefits in adding iterators or enumerators to XPath, it just means that they are needed much less often.
Definitely!
Thus we do agree that there are certain cases (though not so numerous as in C#/Java) where the Enumerator is needed in XPath.
Especially due to the fact that "using lazy evaluation in XPath" cannot be turned on or off by the language and is entirely dependent on a particular implementation.
As for whether "the benefits of iterators/enumerators (specifically, not having to materialise a collection in memory) " ... "are needed much less often", this is a very subjective judgement and can vary from one XPath user to the next.
With the major use-cases identified in this proposal, let's do our job in designing and providing in the Spec the feature that millions of programmers in other languages have and enjoy in their everyday experience.
@dnovatchev Generators are an interesting concept, no doubt. I still need to understand now how you believe they could be applied in practice. Would it be similar to what I drafted in my comment above, https://github.com/qt4cg/qtspecs/issues/716#issuecomment-1812137554 ?
@dnovatchev Generators are an interesting concept, no doubt. I still need to understand now how you believe they could be applied in practice. Would it be similar to what I drafted in my comment above, #716 (comment) ?
@ChristianGruen Yes, generators will be used with folds, iterate-while, recursive functions and other functions on generators.
As for the code supplied in your comment, I (as often happens with such code) don't understand its purpose and meaning. so a verbal explanation would be helpful.
Included in the opening comment of this proposal there are a big number of examples of useful functions on generators, so we could focus on them as needed and well understood.
Any additional questions are welcome 😄
As for the code supplied in your comment, I (as often happens with such code) don't understand its purpose and meaning. so a verbal explanation would be helpful. Any additional questions are welcome 😄
The 3 code snippets in my comment were supposed to return the first $n
values of a generator.
But maybe I’ll ask differently to get closer to your proposal: How would the following code need to be rewritten if $seq
was to be replaced by a generator?
for $item in $seq[position() < 5]
return '• ' || $item
But maybe I’ll ask differently to get closer to your proposal: How would the following code need to be rewritten if
$seq
was to be replaced by a generator?for $item in $seq[position() < 5] return '• ' || $item
The take
function on generators as specified in the proposal, returns an array. thus:
gen:take($gen, 4) => array:for-each($x -> '* ' || $x)
But maybe I’ll ask differently to get closer to your proposal: How would the following code need to be rewritten if
$seq
was to be replaced by a generator?for $item in $seq[position() < 5] return '• ' || $item
The
take
function on generators as specified in the proposal, returns an array. thus:gen:take($gen, 4) => array:for-each($x -> '* ' || $x)
If we define the take
function on generators to return another generator (as done in .NET), instead of an array, then the code will be:
gen:take($gen, 4) => gen:for-each($x -> '* ' || $x)
We can do any operation within generators only, if we want.
Very smooth, short and intuitive, same as with .NET LINQ.
Thanks, that was helpful. So I believe I understand what you're proposing as an addition to the spec would be:
generator
record type andgen
-prefixed namespace) that could basically be used as an alternative to sequences, arrays and maps.How would a generator be constructed?
let $gen := (:
what would need to be placed here,
e.g. for a generator that returns odd numbers?
:)
return gen:take($gen, 4)
=> gen:for-each(fn($x) { '* ' || $x })
@ChristianGruen I wrote a generator/enumerator/lazy list for a random number generator in https://github.com/qt4cg/qtspecs/issues/708#issuecomment-1727179443 using value/next properties. Adapting to Dimitre's record properties, it could be something like (which I haven't tested):
xquery version "3.1";
(: https://en.wikipedia.org/wiki/Linear_congruential_generator :)
declare function local:lcg($seed, $a, $b, $m) {
let $value := ($a * $seed + $b) mod $m
return map {
"initialized": true(),
"endReached": false(),
"getCurrent": function () { $value },
"moveNext": function () {
local:lcg($value, $a, $b, $m)
}
}
};
(: ZX81 random number generator LCG values :)
let $gen := local:lcg(100, 75, 74, math:pow(2, 16) + 1)
return gen:take($gen, 4)
=> gen:for-each(fn($x) { '* ' || $x })
Thanks, that was helpful. So I believe I understand what you're proposing as an addition to the spec would be:
- a new
generator
record type and- a set of functions (possibly in a
gen
-prefixed namespace) that could basically be used as an alternative to sequences, arrays and maps.How would a generator be constructed?
let $gen := (: what would need to be placed here, e.g. for a generator that returns odd numbers? :) return gen:take($gen, 4) => gen:for-each(fn($x) { '* ' || $x })
The wanted code:
let $gen := generator{
initialized: true(),
endReached: false(),
getCurrent: fn() {$this.currentValue},
moveNext: fn($this) {map:put($this, "currentValue", $this.currentValue + 2)},
currentValue: 1
}
return
gen:for-each(fn($x) { '* ' || $x) => gen:subrange(1, 4)
We have a good choice for the return expression:
`gen:for-each(fn($x) { '* ' || $x) => gen:subrange(1, 4)`
or
`gen:for-each(fn($x) { '* ' || $x) => gen:take(4)`
or
`gen:take(4) => gen:for-each(fn($x) { "* " || $x)`
or
`gen:subrange(1, 4) => gen:for-each(fn($x) { '* ' || $x)`
ZX81
Nice, the one I started with… @rhdunn Thanks for the adapted generator example.
And thanks @dnovatchev. So in addition the proposal would include…
Which of the generator record keys are mandatory, and which are optional?
ZX81
Nice, the one I started with… @rhdunn Thanks for the adapted generator example.
And thanks @dnovatchev. So in addition the proposal would include…
- the definition of a generator constructor, which contains an initial value, and
The proposal will not include any additional information - users are free to provide whatever generators they need. We may have useful examples, though
- a way to treat records as objects, as discussed in From Records to Objects #720
Yes, I think the current record definition allows accessing other members from the record
Which of the generator record keys are mandatory, and which are optional?
The 4 key values that define a map as a generator-record are clearly specified in the proposal. One can just read:
let $generator as record (initialized as xs:boolean, endReached as xs:boolean, getCurrent as function(..) as item()*, moveNext as function(..) as .. , * )
The proposal will not include any additional information
I got it. So instead of generator { ... }
one would probably use the existing map constructor?
map {
initialized: true(),
endReached: false(),
getCurrent: fn() { $this.currentValue },
moveNext: fn($this) { map:put($this, "currentValue", $this.currentValue + 2) },
currentValue: 1
}
Yes, I think the current record definition allows accessing other members from the record
I may have missed that. Do you have a link?
The 4 key values that define a map as a record are clearly specified in the proposal. One can just read:
I saw that indeed, but I was e.g. surprised by this:
initialized is a boolean. When a generator $gen is initially instantiated, $gen?initialized is false()
In your example, it was true()
, so isn't it up to the user after all (as you said that “users are free to provide whatever generators they need”)?
Sorry for all the questions, it's not to criticize the proposal, I just want understand more of the nifty details.
In your example, it was
true()
, so in't it up to the user after all (as you said that “users are free to provide whatever generators they need”)?
Yes, but when a function is passed a generator as argument, it shouldn't rely on it already being initialized.
In certain cases the user may create and initialize a generator, but in other cases they may decide not to do the initialization immediately.
Yes, I think the current record definition allows accessing other members from the record
I may have missed that. Do you have a link?
I cannot find this immediately. I think this was in an issue proposed by @michaelhkay in which members of a record can call each other recursively.
@ChristianGruen @dnovatchev The https://github.com/qt4cg/qtspecs/issues/720 proposal suggests a $this
implicit variable bound to the record instance in a function call on the record instance.
@ChristianGruen @dnovatchev The #720 proposal suggests a
$this
implicit variable bound to the record instance in a function call on the record instance.
I think that we need to finalize #720, as it would provide the user with great power and convenience.
Update:
The main change in the examples is that now the take
function produces a generator and not an array as before.
Also introducing for convenience an emptyGenerator:
let $emptyGenerator := map{
initialized : true(),
endReached: true(),
getCurrent: function($this as map(*)) {error()},
moveNext: function($this as map(*)) {error()}
}
Here is the take function:
take($gen as generator, $n as xs:integer) as generator
{
let $gen := if(not($gen?initialized)) then $gen?moveNext()
else $gen,
return
if( $gen?endReached or $n eq 0) then emptyGenerator()
else map:put($gen, "moveNext", take($gen?moveNext($gen) ), $n -1 )
}
Also updated the opening proposal with the above two functions.
Update:
Here are some other useful functions on generators -- also added to the originating proposal -- with just their signature and summary:
concat($gen1 as generator , $gen2 as generator ) - produces a generator that behaves as $gen1
until $gen1.endReached
becomes true()
, and then behaves as $gen2
*append($gen as generator, $value as item())** - produces a generator that behaves as $gen
until $gen.endReached
becomes true()
, and then as a generator that has only the single value value
.
*prepend($gen as generator, $value as item())** - produces a generator whose first value is value
and then behaves as $gen
.
some($gen as generator) as xs:boolean - Produces true()
if $gen
has at least one value, and false()
otherwise.
*some($gen as generator, $pred as function(item()) as xs:boolean) as xs:boolean** - Produces true()
if $gen
has at least one value for which $pred($thisValue) is true(), and false()
otherwise.
ofType($gen as generator, $type as type) - Produces a new generator from $gen
that contains all values from $gen
of type type
-- for this we need to have added to the language the type object.
*skipWhile($gen as generator, $pred as function(item()) as xs:boolean)** - Produces a new generator from $gen
by skipping all starting values for which $pred($theValue)
is true()
.
*takeWhile($gen as generator, $pred as function(item()) as xs:boolean)** - Produces a new generator from $gen
which contains all starting values of $gen
for which $pred($theValue)
is true()
.
toArray($gen as generator) - Produces an array that contains all values that are contained in $gen
.
toSequence($gen as generator) - Produces a sequence that contains all values that are contained in $gen
. Values of $gen
that are sequences themselves are flattened.
toMap($gen as generator) - If the values in $gen are all key-value pairs, produces a map that contains exactly all the key-value pairs from $gen
.
These and many other useful functions on generators can and should be added to every generator upon construction.
Thus, it would be good to have an explicit constructor function for a generator:
construct-generator($record as
record( initialized as xs:boolean,
endReached as xs:boolean,
getCurrent as function(..) as item()*,
moveNext as function(..) as .. ,
)
) as generator
I like this proposal, but I'm not sure if the interface needs to be as complex as it is (as others have mentioned).
I prefer interfaces to be total, without errors, if they can be avoided.
My suspicion in the IEnumerable may be the way it is for historical reasons when the world was imperative and may even go back to VB6.
My knowledge of writing anything but trivial XPath is a curse, but also a liberation.
But doesnt effectively the list functional "pattern match" like function suffice?
in psuedo code.
let (maybeHead,tail) = headTail(iterator)
i.e. the (haskell/ML/OCaml/F#) functional type of the method (you could in fact implement it as a single partial function)
headTail : iterator a -> (maybe a, iterator a)
or in C#
(Option<A>,Iterator<A>) HeadTail<A>(Iterator<A> iterator)
it has the advantage of
whether that sits well in an XPath context, I can't comment.
To compare this construct directly with IEnumerable
So this is the object interface I'm suggesting may be good enough,
type Iterator<'a> =
abstract member headTail : unit -> ('a option * Iterator<'a>)
this code makes an Iterator out of a List
module List =
let rec makeIterator : List<'a> -> Iterator<'a> =
fun xs ->
{
new Iterator<'a> with
member _.headTail () =
match xs with
| [] ->
(None, makeIterator [])
| head :: tail ->
(Some head, makeIterator tail)
}
this makes an iterator out of an IEnumerable (known as 'seq' in F#)
module Seq =
open System.Collections.Generic
let makeIterator : 'a seq -> Iterator<'a> =
let rec makeIteratorFromIterator : IEnumerator<'a> -> Iterator<'a> =
fun xs ->
{
new Iterator<'a> with
member _.headTail () =
match xs.MoveNext() with
| true ->
(Some xs.Current,makeIteratorFromIterator xs)
| _ ->
(None,makeIteratorFromIterator xs)
}
fun xs ->
makeIteratorFromIterator (xs.GetEnumerator())
this demonstrates how to write a loop (i.e. fold on it)
module Iterator =
let rec fold : ('state -> 'a -> 'state) -> 'state -> Iterator<'a> -> 'state =
fun folder seed iterator ->
let (headMaybe,tail) = iterator.headTail ()
match headMaybe with
| Some a ->
fold folder (folder seed a) tail
| None ->
seed
this uses an iterator to fold a list
let 6 = Iterator.fold (+) 0 (List.makeIterator [1;2;3])
this uses an iterator to fold an IEnumerable
let 6 = Iterator.fold (+) 0 (Seq.makeIterator (seq { 1;2;3 }))
so we can see that basically I can turn an IEnumerable
but I can also change it back (but note the awful pain...nasty mutable vars doing things I barely understand, basically a bit of a mess).
let toEnumerator : Iterator<'a> -> IEnumerator<'a> =
fun iterator ->
let (maybeHeadI,tailI) = iterator.headTail ()
let mutable maybeHead = ref maybeHeadI
let mutable tail = ref tailI
{
new IEnumerator<'a> with
member _.Dispose() =
()
member _.Current
with get () =
(!maybeHead).Value : obj
member _.Current
with get () =
(!maybeHead).Value
member _.MoveNext () =
let (maybeHeadI,tailI) = iterator.headTail ()
maybeHead := maybeHeadI
tail := tailI
not(!maybeHead = None)
member _.Reset () =
let (maybeHeadI,tailI) = iterator.headTail ()
maybeHead := maybeHeadI
tail := tailI
()
}
and here is the IEnumerable
let toSeq : Iterator<'a> -> IEnumerable<'a> =
fun iterator ->
{
new IEnumerable<'a> with
member _.GetEnumerator () : IEnumerator<'a> =
toEnumerator iterator
member _.GetEnumerator () : System.Collections.IEnumerator =
toEnumerator iterator
}
I now we can use the implementation of fold in F# defined on IEnumerable
let iterator = List.toSeq [1;2;3]
let 6 = Seq.fold (+) 0 iterator
So at least in F# these two constructs seem to be isomorphic, but the Iterator<'a> construct seems much much simpler and easy to use...in an F# context
whether that simplicity translates in XPath, I can't comment, bit to me it seems simpler.
even simpler it can be defined as a single function, but I think it it may obscure the correlation to the Map constructs in the XPath code.
type IteratorFunction<'a> = IteratorFunction of (unit -> ('a option * IteratorFunction<'a>)) with
member this.run () =
let (IteratorFunction f) = this
f()
module IteratorFunction =
let rec fromList : List<'a> -> IteratorFunction<'a> =
fun xs ->
let next () =
match xs with
| [] ->
(None, fromList [])
| head :: tail ->
(Some head, fromList tail)
IteratorFunction next
let rec fold : ('state -> 'a -> 'state) -> 'state -> IteratorFunction<'a> -> 'state =
fun folder seed iterator ->
let (headMaybe,tail) = iterator.run ()
match headMaybe with
| Some a ->
fold folder (folder seed a) tail
| None ->
seed
let 6 = IteratorFunction.fold (+) 0 (IteratorFunction.fromList [1;2;3])
If you wish to give programmers an interface like IEnumerable (i.e. the one proposed) you CAN do so, you simply include these functions in the set of "useful" Iterator functions (rather than methods) and use the code laid out to turn an Iterator
If you wish to give programmers an interface like IEnumerable (i.e. the one proposed) you CAN do so, you simply include these functions in the set of "useful" Iterator functions (rather than methods) and use the code laid out to turn an Iterator into an IEnumerable, you then only need to implement a single function per collection, to use this and any other function defined on it.
Yes this is the idea.
Sorry, I don't know F# and cannot understand what the provided F# code does.
Also, we don't have methods in XPath - just functions. But we can have a set of functions all of which belong to the same (XML) namespace.
What is a generator?
Generators are well known and provided out of the box in many programming languages. Per Wikipedia:
“In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators.[1] A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.”
The goal of this proposal (major use-cases)
A generator in XPath should be a tool to easily implement the solutions to the following use-cases:
Processing a huge collection whose members may not all be needed.
A generator will produce only the next member of the collection and only on demand basis.
Handling a collection containing unknown or infinite number of members. When requested the next member of the collection the generator will always produce it, if the collection still contains any members. It is the responsibility of the caller to issue only the necessary number of requests for the really needed next members.
What is achieved in both cases:
A good problem that is based on these use-cases is to generate a collection of the first N members that have some wanted properties, and are generated from other collection(s), when it is not known what the size of the original input collections would be in order for the desired number of N members to be discovered.
For example: Produce the first 1 000 000 (1M) prime numbers.
Sometimes we may not even know if N such wanted members actually exist, for example: _Produce the first 2 sequences of 28 prime numbers where the primes in each of the sequences form an arithmetic progression._
The Proposal
A generator is defined as (and synonym for):
A generator is an extensible record .
It has four fixed-named keys, and any other map-keys, as required to hold the internal state of that specific generator.
Here is the meaning of the four fixed/named keys:
initialized is a boolean. When a generator
$gen
is initially instantiated,$gen?initialized
isfalse()
. Any call to$gen?getCurrent()
raises an error. In order to get the first value of the represented collection, the caller must call$gen?moveNext()
endReached is a boolean. If after a call to
moveNext()
the value of the returned generator'sendReached
key istrue()
then callingmoveNext()
and/orgetCurrent()
on this generator raises an error.getCurrent is a function of zero arguments. It must only be called if the values of
initialized
istrue()
and the value ofendReached
isfalse()
, otherwise an error must be raised. This function produces the current member of the collection after the last call tomoveNext
, if this call didn't return a generator whoseendReached
value wastrue()
moveNext is a function of zero arguments. When called on a generator whose
endReached
value isfalse()
then it produces the next (state of the) generator. including a possiblytrue()
value ofendReached
and if this value is stillfalse()
, then callinggetCurrent()
produces the value of the next member of the collection.Examples of operations on generators
The following examples are written in pseudo-code as at the time of writing there was no available implementation of records. And also, the code for recursion in pure XPath makes any such example longer than necessary for grasping its meaning.
The Empty Generator
Take the first N members of the collection
Skip the first N members from the collection
Subrange of size N starting at the M-th member
Head of a generator
Tail of a generator
At index N
For Each
For Each Pair
Filter
Here are some other useful functions on generators -- with just their signature and summary:
concat($gen1 as generator , $gen2 as generator ) - produces a generator that behaves as
$gen1
until$gen1.endReached
becomestrue()
, and then behaves as$gen2
*append($gen as generator, $value as item())** - produces a generator that behaves as
$gen
until$gen.endReached
becomestrue()
, and then as a generator that has only the single valuevalue
.*prepend($gen as generator, $value as item())** - produces a generator whose first value is
value
and then behaves as$gen
.some($gen as generator) as xs:boolean - Produces
true()
if$gen
has at least one value, andfalse()
otherwise.*some($gen as generator, $pred as function(item()) as xs:boolean) as xs:boolean** - Produces
true()
if$gen
has at least one value for which $pred($thisValue) is true(), andfalse()
otherwise.ofType($gen as generator, $type as type) - Produces a new generator from
$gen
that contains all values from$gen
of typetype
-- for this we need to have added to the language the type object.*skipWhile($gen as generator, $pred as function(item()) as xs:boolean)** - Produces a new generator from
$gen
by skipping all starting values for which$pred($theValue)
istrue()
.*takeWhile($gen as generator, $pred as function(item()) as xs:boolean)** - Produces a new generator from
$gen
which contains all starting values of$gen
for which$pred($theValue)
istrue()
.toArray($gen as generator) - Produces an array that contains all values that are contained in
$gen
.toSequence($gen as generator) - Produces a sequence that contains all values that are contained in
$gen
. Values of$gen
that are sequences themselves are flattened.toMap($gen as generator) - If the values in $gen are all key-value pairs, produces a map that contains exactly all the key-value pairs from
$gen
.These and many other useful functions on generators can and should be added to every generator upon construction.
Thus, it would be good to have an explicit constructor function for a generator: