tc39 / proposal-iterator-helpers

Methods for working with iterators in ECMAScript
https://tc39.es/proposal-iterator-helpers
1.33k stars 34 forks source link

Surprising results for "forked" consecutive operations #79

Closed markusjohnsson closed 4 years ago

markusjohnsson commented 4 years ago

I have used a lot of different tools to achieve this functionality and I was very happy to see this proposal to include such functionality by default.

However, I am worried that porting projects to use this functionality will cause a lot of bugs, since there is a fundamental difference from how other libraries behave.

The main issue I see is with simple code like this:

    const animals = items.filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);

It will behave differently whether items is an Array or an Iterator. If Array, then both loops will log results; we would have successfully created one sequence of dogs and one of cats. However, if Iterator, then only dogs will get listed, because the animals Iterator will be consumed during the iteration of dogs and no cats will be listed.

Moreover, it is not only Array semantics that this diverges from. The semantics of Array are the same as those for a number of popular js libraries:

I will post a demonstrative script for those libraries as a follow up.

Workarounds

Some workarounds without external libraries exist:

  1. Use Array. Cons: not lazy, allocating arrays for each operation, does not have all operators.
  2. Use toArray() on intermediate sequences (animals above). Cons: not lazy, causing allocations that should not be needed.
  3. Use a function to wrap creation of intermediate sequences. This needs to be "at the top" of the chain, you have no chance of doing this after you have entered the Iterator world.

I think this is a much needed feature for JavaScript, however I am worried that the current spec will cause bugs and headaches and prevent a proper solution further on.

markusjohnsson commented 4 years ago

As promised, here is a script which demonstrates that similar code for different libraries behaves the same, and only iterator-helpers differ:


require('core-js/proposals/iterator-helpers');

const items = [
    { type: "animal", subtype: "dog", name: "Rufus" },
    { type: "animal", subtype: "cat", name: "Whiskers" },
    { type: "automobile", subtype: "car", name: "Volvo" },
    { type: "automobile", subtype: "car", name: "Saab" },
];

(function () {
    console.log("=== Array ===");
    const animals = items.filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

(function () {
    console.log("=== immutable js ===");
    const { List } = require('immutable');

    const animals = List(items).filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

(function () {
    console.log("=== underscore ===");
    const _ = require('underscore');

    const animals = _.chain(items).filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs.value())
        console.log(d.name);

    for (let d of cats.value())
        console.log(d.name);
})();

(function () {
    console.log("=== lodash ===");
    const _ = require('lodash');

    const animals = _.chain(items).filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs.value())
        console.log(d.name);

    for (let d of cats.value())
        console.log(d.name);
})();

(function () {
    console.log("=== linq ===");
    const Enumerable = require('linq');

    const animals = Enumerable.from(items).where(i => i.type == "animal");
    const dogs = animals.where(a => a.subtype == "dog");
    const cats = animals.where(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

(function () {
    console.log("=== ix ===");
    const { from } = require('ix/iterable');
    const { filter } = require('ix/iterable/operators');

    const animals = from(items).pipe(filter(i => i.type == "animal"));
    const dogs = animals.pipe(filter(a => a.subtype == "dog"));
    const cats = animals.pipe(filter(a => a.subtype == "cat"));

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

(function () {
    console.log("=== iterator-helpers ===");

    const animals = Iterator.from(items).filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

Output:

=== Array ===
Rufus
Whiskers
=== immutable js ===
Rufus
Whiskers
=== underscore ===
Rufus
Whiskers
=== lodash ===
Rufus
Whiskers
=== linq ===
Rufus
Whiskers
=== ix ===
Rufus
Whiskers
=== iterator-helpers ===
Rufus
devsnek commented 4 years ago

this example only works because you know you're using an array. if you had received an iterator from some library or something you would not be able to create a fork here at all, regardless of iterator vs iterable. In the future a tee() method might be added to address forking but that is not part of the current proposal.

markusjohnsson commented 4 years ago

Well, it also works with the other libraries I tried. As I see it, it is the expected way and I think developers would be surprised with the proposed semantics.

How would a tee method work? As far as I understand there is no way to clone an iterator or create a (reusable) iterable from an iterator?

devsnek commented 4 years ago

@markusjohnsson tee would use an internal buffer. If you don't mind can you recreate your demos using items.values() instead of items?

markusjohnsson commented 4 years ago

@devsnek yes, but that uses the iterator-helpers API and not the Array API, so there is no point in comparing the result to to iterator-helpers, right?

(function () {
    console.log("=== Array values() (iterator-helpers API) ===");
    const animals = items.values().filter(i => i.type == "animal");
    const dogs = animals.filter(a => a.subtype == "dog");
    const cats = animals.filter(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();
=== Array values() (iterator-helpers API) ===
Rufus
devsnek commented 4 years ago

Well it seems my point was lost... In JavaScript the general pattern is that you're given an iterator directly (see: generators) from a function call, so you'd use function calls to represent reusability (const animals = () => array.values().filter(...)).

markusjohnsson commented 4 years ago

No, that point did not make it across. Creating a function to wrap the sequence is however what I proposed as a potential workaround in point 3 in my original post. IMHO it should not be needed.

My point however is that existing libraries (in JavaScript) that are used to accomplish what this API is providing, do not have that behavior.

devsnek commented 4 years ago

your existing libraries have the same problem. how do you propose one magically restarts an iterator without calling the generator again.

senocular commented 4 years ago

Just wanted to chime in to provide some concrete examples that show what devsnek was talking about using the libraries from the previous example which I believe are both lazy and work with iterators.

(function () {
    console.log("=== linq ===");
    const Enumerable = require('linq');

    const animals = Enumerable.from(items.values()).where(i => i.type == "animal");
    const dogs = animals.where(a => a.subtype == "dog");
    const cats = animals.where(a => a.subtype == "cat");

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

(function () {
    console.log("=== ix ===");
    const { from } = require('ix/iterable');
    const { filter } = require('ix/iterable/operators');

    const animals = from(items.values()).pipe(filter(i => i.type == "animal"));
    const dogs = animals.pipe(filter(a => a.subtype == "dog"));
    const cats = animals.pipe(filter(a => a.subtype == "cat"));

    for (let d of dogs)
        console.log(d.name);

    for (let d of cats)
        console.log(d.name);
})();

Output:

=== linq ===
Rufus
=== ix ===
Rufus

The only changes made were changing items to items.values() which provides these libraries with the iterator for the array rather than the array itself (an iterable).

The results are the same seen with the iterator helpers. This is a consequence of the behavior of iterators, not necessarily the implementations of the libraries. The problem is that iterators are usable only once and then become fully consumed, no longer able to produce more values. Once the first iteration through the values is complete (dogs), there are no more values to pull for the second (cats). And once you have a fully consumed iterator, you have no way of resetting it.

This works for arrays - or most other iterables - because they produce new iterators each time they're iterated through as defined by their @@iterator method. When iterating through the items in the cats loop, even though using the same array used with dogs, it will produce a new iterator and a new set of values (assuming they're even going through iterators for that case which would be implementation dependent) allowing the cat name to be filtered out.

Note that generator objects are a little different in that, though they are iterables, they are also iterators, and their @@iterator just returns themselves rather than a new iterator.

Also note that greedy iteration can overcome this by producing an iterable from original iterator right away and using that for the basis of all additional forks/iterations.

markusjohnsson commented 4 years ago

As I've demonstrated, if I use them as designed, they do not have that problem, because the operators are based on iterables rather than iterator. I'm not suggesting "rewinding" an iterator, which is impossible. I'm suggesting building an api based on iterable and not iterator.

devsnek commented 4 years ago

In that case I'm going to close this as a duplicate of #8, #68, and #18

markusjohnsson commented 4 years ago

The issue still stands even if you reject my proposed solution.

ljharb commented 4 years ago

It seems like the issue is one with iterators themselves, namely that they’re not reusable. Even if some of them are via their iterables, all iterators aren’t, and an api based on reusable iterables wouldn’t support all iteration use cases.

In other words, i don’t think the existing design of the language considers this issue a problem.

markusjohnsson commented 4 years ago

@ljharb no, the issue is that most libraries (including the built-in Array methods) that developers are familiar with does not have these semantics and therefor it is a risky design as developers might think that you can port existing code to this very similar API.

ljharb commented 4 years ago

@markusjohnsson not String.prototype.matchAll, which when called produces an iterator you can only use once; similarly, not .values()/.keys()/.entries() on arrays/Maps/Sets - certainly many things take iterables, but that's what Iterator.from is for. Once you have an iterator, it is not reusable, and this is true of all builtin types. array[Symbol.iterator]() is not a reusable iterator either.

peter-leonov commented 4 years ago

Hi here! Great thank you for the proposal!

Being also a multi-seasoned JS developer remembering the prototype.js era I really like seeing the new language and library constructs coming into the JS ecosystem. I was always feeling as being limited to using only arrays while dealing with lists and streams. Iterators and tools for them are opening the (pandora?) toolbox of doing things like RxJS or transducers with the native language support.

Yes, iterators and helpers for them have (and IMO should have) a different usage semantics than arrays. I'm myself fine learning that filter() is a destructive operation on iterators compared to arrays to get the power of laziness and (almost) allocation free stream processing.

As for all the new things like generators and promises the user would need to learn the new style, the advantages and the limitations, and as far as the proposal does not break the existing code I'm personally fine having this investment required.