modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo
Other
22.31k stars 2.55k forks source link

[Feature Request] Introduce Iterator Trait #2629

Open jayzhan211 opened 2 months ago

jayzhan211 commented 2 months ago

Review Mojo's priorities

What is your request?

I would like to have iterator trait like

trait Iterator {
   type Item

   fn next(self) -> Optional<Item>
}

or

trait Iterator[T: AnyType] {
   fn next(self) -> Optional<T>
}

What is your motivation for this change?

Support syntax that expect iterator

  1. tuple([iterable])
  2. set([iterable])
  3. fromkeys(iterable[, value])
  4. list([iterable])
  5. str.join(iterable)

Any kind of struct that implement iterator trait can be built with function that expect iterator

struct MyStruct(Iterator):
   type Item = Int

   fn next() -> Item

fn set(Iterator):
   // Create set for MyStruct

Any other details?

The first example is from rust https://doc.rust-lang.org/book/ch19-03-advanced-traits.html

python doc https://docs.python.org/3/library/stdtypes.html

martinvuyk commented 1 month ago

Currently this is what's in /stdlib/src/builtin/_stubs.mojo


trait _IntNext(Copyable):
    fn __next__(inout self) -> Int:
        ...

trait _IntIter(_IntNext):
    fn __len__(self) -> Int:
        ...

trait _IntIterable(_IntIter):
    fn __iter__(self) -> Self:
        ...

trait _StridedIterable(_IntIter):
    fn __iter__(self) -> _StridedRangeIterator:
        ...

@value
struct _NextEval[T: _IntNext]:
    var iter: T
    var i: Int

fn _next_eval[T: _IntNext](iter: T) -> _NextEval[T]:
    var copy = iter
    var next = copy.__next__()
    return _NextEval(copy, next)

@always_inline
fn _gen_next[
    inferred T: _IntIter, iter: T, f: fn[i: Int] () capturing -> None
]():
    @parameter
    if iter.__len__() == 0:
        return
    else:
        alias next = _next_eval(iter)
        f[next.i]()
        _gen_next[next.iter, f]()

@always_inline
fn parameter_for[
    inferred T: _IntIterable, iter: T, f: fn[i: Int] () capturing -> None
]():
    _gen_next[iter.__iter__(), f]()

@always_inline
fn parameter_for[
    inferred T: _StridedIterable, iter: T, f: fn[i: Int] () capturing -> None
]():
    _gen_next[iter.__iter__(), f]()

And this would be the Reversible trait in /stdlib/src/builtin/reversed.mojo

trait ReversibleRange:
    """
    The `ReversibleRange` trait describes a range that can be reversed.

    Any type that conforms to `ReversibleRange` works with the builtin
    [`reversed()`](/mojo/stdlib/builtin/reversed.html) functions.

    The `ReversibleRange` trait requires the type to define the `__reversed__()`
    method.

    **Note**: iterators are currently non-raising.
    """

    # TODO: general `Reversible` trait that returns an iterator.
    # iterators currently check __len__() instead of raising an exception
    # so there is no ReversibleRaising trait yet.

    fn __reversed__(self) -> _StridedRange:
        """Get a reversed iterator for the type.

        **Note**: iterators are currently non-raising.

        Returns:
            The reversed iterator of the type.
        """
        ...

I do like the Idea you mentioned of having __next__ be an iterator Trait, but I think __iter__ should be there as the main player. I don't know if there is a more generic way to do this.

trait _IterableNext[T: CollectionElement]:
    fn __next__(self) -> Optional[T]:
        ...

struct Iterator[T: CollectionElement, A: _IterableNext[T]]:
    _iterable: A

    fn __init__(inout self, iterable: A):
        self._iterable = iterable

    fn __next__(self) -> Optional[T]
        return next(self._iterable)

trait Iterable[T: CollectionElement]:
    fn __iter__(self) -> Iterator[T]:
        ...

But I think such an interface will be useful because when we have Generators it could be something like

struct Generator[T: CollectionElement, *A: AnyType, **B: AnyType]:
    _func: Coroutine[fn(*A, **B) yields -> Optional[T]]

    fn __init__(inout self, func: fn(*A, **B) yields -> Optional[T], *args: A, **kwargs: B):
       self._func = Coroutine(func, args, kwargs)

    fn __next__(self) -> Optional[T]:
        return self._func.get()

That way people could do

fn some_yielding_func() yields -> Optional[Int]
    ...

fn main():
    var iterator = Iterator(Generator(some_yielding_func))
    # StringLiteral.join() could even take a generic yielding function and build the iterator itself
    var concatenated = ", ".join(iterator)
JoeLoser commented 1 month ago

Thanks for the feature request! It's not quite as easy as just defining the trait you have proposed. There are several things to consider such as Python compatibility, code-gen of the iterator support, and so on. It's likely we will drift from Python (and not just raise StopIteration error in our iterator APIs) and instead return Optional[T] like you said. But it's a bit nuanced and we should consider how our Mojo -> Python compat layer in runtime can project an Optional[T] onto that named error type nicely.

If you'd like to write up a one pager exploring the iterator API design space, why Python's raising approach isn't great for performance (and won't be good for Mojo anyways in the absence of parametric raises in the short-term), I'd encourage you (or others) to do so.

martinvuyk commented 1 month ago

@jayzhan211 do you wanna do the pull request with the proposal? I wrote something rn you can use if you want

Iterator trait

As started by @jayzhan211 in issue #2629, an Iterable trait and Iterator implementation would be very useful. Especially since currently every stldlib type has its own iterator and support for each has to be added independently.

What is proposed?

trait Nextable[T: CollectionElement]:
    fn __next__(self) -> Optional[T]:
        ...

struct Iterator[T: CollectionElement, A: Nextable[T]]:
    var _nextable: A

    fn __init__(inout self, nextable: A):
        self._nextable = nextable

    fn __next__(self) -> Optional[T]
        return next(self._nextable)

    fn __next__[pythonland: Bool](self) raises -> T
        var item = next(self._nextable)
        if not item:
            raise Error("StopIteration")
        return item
    ...

trait Iterable[T: CollectionElement]:
    fn __iter__(self) -> Iterator[T]:
        ...

What would be needed?

The for loop codegen implementation would need to check for None and break so that pythonic syntax is preserved

for i in List("something", "something"):
    print(i)

Other details

To allow for functional patterns, the Iterator struct could have wrappers for itertools

struct Iterator[T: CollectionElement, A: Nextable[T]]:
    ...
    fn map(owned self, func: fn(value: T) -> T) -> Self:
        return map(func, self)

    fn filter(owned self, func: fn(value: T) -> Optional[T]) -> Self:
        return filter(func, self)

    fn batched(owned self, amnt: Int) -> Self:
        return itertools.batched(self, amnt)

zip implementation would be something like

struct _zip[*Ts: CollectionElement, *A: Iterable[Ts]]:
    var _iters: Tuple[*A]
    var _len: Int

    fn __init__(inout self, *values: *A):
        self._len = len(values)
        @unroll
        for i in range(self._len):
            self._iters[i] = iter(values[i])

    fn __next__(self) -> Optional[Tuple[*Ts]]:
        var items: Optional[Tuple[*Ts]]
        @unroll
        for i in range(self._len):
            item[i] = next(self._iters[i])
        return items

fn zip[*Ts: CollectionElement, *A: Iterable[Ts]](*values: *A) -> Iterator[*Ts]:
    return Iterator(_zip(values))

And once we have capable enough generics for this code to be feasible, implementing a generator would be as simple as taking a yielding function

struct Generator[T: CollectionElement, *A: AnyType, **B: AnyType]:
    var _func: Coroutine[fn(*A, **B) -> Optional[T]]
    var _args: A
    var _kwargs: B

    fn __init__(inout self, func: fn(*A, **B) -> Optional[T], *args: A, **kwargs: B):
        self._func = Coroutine(func)
        self._args = args
        self._kwargs = kwargs

    fn __next__(self) -> Optional[T]:
        return self._func(self._args, self._kwargs).get()

That way:

fn some_yielding_func() -> Optional[Int]
    ...

fn main():
    var iterator = Iterator(Generator(some_yielding_func))
    var concatenated = ", ".join(iterator)
jayzhan211 commented 1 month ago

@jayzhan211 do you wanna do the pull request with the proposal? I wrote something rn you can use if you want

Probably not recently, you can go ahead! Also, I'm not familiar with the code-gen of the iterator support

martinvuyk commented 1 month ago

Also, I'm not familiar with the code-gen of the iterator support

I also have no clue. I'm not sure if @unroll can even be implemented with the None check branch.

I'm also not privy to how yield will be implemented. And we are still missing some core trait and Variadic Args & Parameters features.