Expanding the internal iteration API

tcbrindle commented 1 year ago

Currently internal iteration is achieved by a sequence customisation point for_each_while(predicate) -> cursor_t, which instructs the sequence to iterate over all of its elements until such time as the predicate returns false.

Unfortunately there are some cases where we can't use internal iteration even though we'd like to. Algorithms like fold_first (#98), min and max need to inspect the initial element before iterating over the rest of the sequence; more generally, any time we take a slice of a sequence (such as with the elements of chunk, chunk_by, slide etc) then we can't implement for_each_while() on the slice in terms of an equivalent call on the base sequence, and so have to fall back to external iteration.

A solution to this would be to replace the existing for_each_while() customisation point with a new pair of functions taking extra cursor arguments to indicate where iteration should begin and end:

auto iterate_while(auto& seq, auto&& pred, cursor_t from) -> cursor_t;
auto iterate_while(auto& seq, auto&& pred, cursor_t from, cursor_t to) -> cursor_t;

The slightly odd parameter order is to allow some sequence impls to provide just a single overload, with the to parameter defaulted to flux::last(). Implementations wouldn't need to provide both of these functions: in particular, most single-pass sequences aren't going to be able to provide the second one.

With this approach, algorithms like fold_first could examine the first element before proceeding with a call to iterate_while(); and iterate_while() on slices could be implemented in terms of calls to the equivalent function on their parent sequence. The existing flux::for_each_while(seq, pred) wrapper function could be implemented as a call to iterate_while(seq, pred, first(seq)).

This would add (yet) another customisation point that authors would have to worry about when writing a sequence implementation, which I'm somewhat reluctant to do, but I think the codegen benefits of internal iteration are great enough that it's justified in this case.

@brycelelbach @brevzin @jnytra any thoughts?

brevzin commented 1 year ago

Yeah I ran into the same problem when I was trying to do this, and instead came up with a janky semi-stateful internal iteration solution? Not amazing.

So I guess the implementation of min would look something like... this?

template <sequence Seq>
auto min(Seq& seq) -> Optional<value_t<Seq>> {
    auto cursor = first(seq);
    if (is_last(seq, cursor)) {
        return {};
    }

    Optional<value_t<Seq>> best = read_at(seq, cursor);
    inc(seq, cursor);
    iterate_while(seq,
        [&](auto&& elem){ *best = std::min(*best, elem); return true; },
        cursor);
    return best;
}

tcbrindle commented 1 year ago

I think min() itself wouldn't need to be changed, as it just calls fold_first(). But we could implement that as something like:

template <sequence Seq, typename Func, typename V = value_t<Seq>>
auto fold_first(Seq&& seq, Func func) -> flux::optional<V>
{
    auto cur = first(seq);

    if (is_last(seq, cur)) {
        return std::nullopt;
    }

    V init(read_at(seq, cur));
    inc(seq, cur);

    iterate_while(seq, [&](auto&& elem) {
        init = std::invoke(func, std::move(init), FLUX_FWD(elem));
        return true;
    }, std::move(cur));

    return optional<V>(std::in_place, std::move(init));
}

(Which is pretty much exactly what you just wrote....)

brevzin commented 1 year ago

I think this seems like a pretty cool idea. Little awkward having the lambda not last I guess, so maybe do iterate_while(seq, cur, f) and iterate_while(seq, cur, last, f)? Definitely curious to see how this pans out!

jnytra commented 1 year ago

I think too many customization points can confuse users. I prefer a simpler API, even if it means a small performance loss. But if the compiler can optimize much better, it would be a pity not to use it. Maybe we can do some micro benchmarks to prove the speedup. I also agree with @brevzin that we should use lambda as the last argument.

tcbrindle commented 1 year ago

I think too many customization points can confuse users.

I agree in general. In this case though I think the argument in favour of adding one more -- in particular, that it would enable internal iteration of slices -- it persuasive enough that it's worth investigating.

I also agree with @brevzin that we should use lambda as the last argument.

As noted above, the odd parameter order is to allow some implementations to combine both overloads into a single function by defaulting the last parameter. For example, the implementation for C arrays could be

template <typename T, index_t N>
struct sequence_traits<T[N]> {
     // ...

    static constexpr auto iterate_while(auto& self, auto&& pred, index_t from, index_t to = N) -> index_t
    {
         for (; from < to; ++from) {
            if (!std::invoke(pred, self[from])) {
                break;
            }
         }

        return from;
    }
};

tcbrindle / flux

Expanding the internal iteration API #99