tcbrindle / flux

A C++20 library for sequence-orientated programming
https://tristanbrindle.com/flux/
Boost Software License 1.0
472 stars 29 forks source link

Complexity Requirements? #127

Open brevzin opened 11 months ago

brevzin commented 11 months ago

Thought I'd follow up my last question with another question.

I see that in flux, filter's implementation of first just does a find_if: https://github.com/tcbrindle/flux/blob/1c128b50af95fc39b6683d437f9210239e219836/include/flux/op/filter.hpp#L50-L53

So is the intent that there is no complexity requirement on first (and potentially last and is_last)?

My go-to example for the complexity requirements in Ranges is r.drop_last_while(p).stride(2), but in flux actually I think it's enough to ask about r.drop_last_while(p) even by itself (which flux doesn't have yet, but don't treat this as a request to go rush to add it). If the flux::is_last check is going to be implemented as a find_if (or, whatever, find_last_if_not) then iterating over this sequence becomes quadratic time: https://github.com/tcbrindle/flux/blob/1c128b50af95fc39b6683d437f9210239e219836/include/flux/op/for_each_while.hpp#L24-L28

tcbrindle commented 11 months ago

Another good question that I really ought to document somewhere! I actually changed this recently -- previously filter, drop_while etc did the same sort of caching as Ranges.

The new policy is as follows:

For sequences weaker than random access, first() may be at worst O(N). (This exception exists pretty much solely to allow filter to be const iterable.)

I believe the constant-time is_last requirement avoids quadratic behaviour in drop_last_while, unless I'm misunderstanding? That was the intention, anyway.

Does that sound reasonable, or are there problematic cases I haven't considered?

brevzin commented 11 months ago
  • is_last() implementations must be O(1)

Which I guess means either flux::drop_last_while must be non-const-iterable (so it can cache the last cursor, and would I guess be the one non-const-iterable sequence in flux? Or maybe you already have generators and I haven't looked?) or there just can't be a flux::drop_last_while? The internal iteration version doesn't have this problem, so maybe it could be only provided in that context? Just throwing out more questions.

tcbrindle commented 11 months ago

To explain the seemingly asymmetric requirement that last() should be O(1), it's because I suspect that people coming from ranges will write things like

while (cur != seq.last()) { ... }

and I don't want that to be needlessly slow.

tcbrindle commented 11 months ago
  • is_last() implementations must be O(1)

Which I guess means either flux::drop_last_while must be non-const-iterable (so it can cache the last cursor, and would I guess be the one non-const-iterable sequence in flux? Or maybe you already have generators and I haven't looked?) or there just can't be a flux::drop_last_while? The internal iteration version doesn't have this problem, so maybe it could be only provided in that context? Just throwing out more questions.

Yeah, drop_last_while, when I come to write it, will probably need to do internal caching and therefore not be const iterable -- but since it's likely to be much less commonly used than filter or (front) drop_while, I'm less worried about it.

We already have a cache_last() adaptor that is non-const iterable (and does exactly what the name suggests) as well as several single-pass-only sequences, so it wouldn't be the only one.

tcbrindle commented 11 months ago

So the theory that is_last() must be O(1) but first() can be linear is all very well, but not when I do things like this:

https://github.com/tcbrindle/flux/blob/1c128b50af95fc39b6683d437f9210239e219836/include/flux/op/reverse.hpp#L61-L64

This is... really not great :(

tcbrindle commented 11 months ago

Thanks Barry for asking tricky questions and making me think deeply about this!

I think that silent, unexpected quadratic behaviour is absolutely unacceptable. That means that is_last(seq, cur) must be constant time (since we call it for every element when iterating), and so we need to think carefully about what happens with things like reverse_adaptor::is_last above.

There are a few different options:

  1. We could go back to requiring that first() is O(1) for all sequences. This would mean re-adding caching to filter, drop_while and various others, in turn making them not const-iterable. My main objections to this are:

    • It means we need to use flux::mut_ref rather than flux::ref to pass them to e.g. a zip adaptor, and as a general rule we try to discourage using mut_ref. It's pretty unfortunate if one of the two most common adaptors requires it.
    • The fact that it's not const iterable, and/or you need to use mut_ref is not very discoverable unless you already know that it does internal caching

    This basically just applies to filter because it's so commonly used: unlike certain other people, I'm not opposed to non-const iterable sequences in principle. We already have several of them in Flux.

    1. We could add caching to reverse_adaptor instead. But this is actually worse, because in most cases the underlying sequence does have a cheap O(1) first() operation (e.g. vector), and so caching is going to add overhead and probably worse codegen (although that's already pretty terrible for reverse_adaptor outside of the internal iteration fast path).
    2. We could have a concept that says "I have constant-time first()", and make reverse_adaptor require it. So then vector would model the concept but filter_adaptor would not, and the user would need to manually add some sort of caching adaptor if they wanted to reverse a filtered sequence.

It turns out that for option 3, we don't actually need a new concept at all: instead, we can add a semantic requirement to the bounded_sequence concept that both first() and last() are O(1). This makes the bounded sequence concept pretty easy to explain ("the sequence can produce cursors to the beginning and end in constant time") and is actually a lot nicer than having uneven requirements on first and last as mentioned above.

So the trade off is that in order to have a const-iterable filter, it isn't reversible without adding a caching adaptor. That actually seems pretty reasonable to me.

Any thoughts @brevzin?

tcbrindle commented 9 months ago

Returning to this after a while... my latest thinking is as follows:

I think this would cover everything.... the only downside is that just about every adaptor would need to do something like disable_bounded_sequence = !bounded_sequence<Base> which is the sort of tedious, easy-to-forget boilerplate that I really wanted to avoid in Flux :(