mcriveramancha / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

Exponential behavior in 'Iterators.concat'. #151

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When 'concat'ed iterators are themselves 'concat'ed performance seems to scale 
exponentially with the depth of the 'concat'ing.

Example code:

===========================================

public static void main(String[] args) {
    for (int depth = 1; depth < 100; depth++) {
        atDepth(depth);
    }
}

private static void atDepth(int depth) {
    System.err.println("depth: " + depth);
    Iterable<String> strings = Collections.emptyList();
    for (int i = 0; i < depth; i++) {
        strings = Iterables.concat(strings, Arrays.asList("" + i));
    }
    for (String string : strings) {
        System.err.print(".");
    }
    System.err.println();
}

===========================================

Run this, and you will notice that performance falls off quickly as 'depth' 
increases.

This fix, in 'Iterators.concat', is fairly simple.

===========================================

public boolean hasNext() {
    boolean currentHasNext;
    while (!(currentHasNext = current.hasNext()) && inputs.hasNext()) {
        current = inputs.next();
    }
    return currentHasNext;
}

===========================================

Perhaps the real solution is "don't use concat recursively!", but this fix 
seems 
trivial enough and would have saved me quite a headache.  :)

Original issue reported on code.google.com by pline...@gmail.com on 23 Apr 2009 at 11:50

GoogleCodeExporter commented 9 years ago
Wow. Exponential indeed. This slip could make it into school books.

Original comment by jim.andreou on 24 Apr 2009 at 12:26

GoogleCodeExporter commented 9 years ago
With some ugly hacks and some O(logn) space (n: depth of iterators), hasNext() 
of a 
concatenated iterator could be O(1) instead of O(logn) of the proposed fix 
(which 
probably matches the original intention of the concat() author), so the 
complete 
iteration should be O(E) (E: number of elements) instead of O(Elogn).

I find the latter issue highly unlikely to be ever fixed, so yeah, I agree that 
too 
many recursive concat() should be avoided. Users should strive to place as many 
iterators in a single concat(), thus reducing the depth, or eliminating it by 
placing 
all iterators in a list and calling concat() only once on that.

Original comment by jim.andreou on 24 Apr 2009 at 12:50

GoogleCodeExporter commented 9 years ago
This is incredibly timely!

I'm developing an implementation of Version Space Algebra [1], making extensive 
use
of Iterators to represent axiomatic hypothesis spaces (such as "some integer"). 
 In
practice, the iterators on the leaves are small, however, these are combined in 
a
tree to generate composite hypothesis spaces that represent the cross-product 
of all
the components -- and the trees get arbitrarily deep.  (The hypothesis spaces 
are
narrowed quickly, and the full cross product is almost never actually used, but 
there
will frequently be a lot of deeply concat'd iterators involved.)

Cutting to the chase:  I will almost certainly need fast iterator concatenation.

Thanks for pointing this out!

[1] http://tlau.org/research/papers/mlj01-draft.pdf

Original comment by cresw...@gmail.com on 24 Apr 2009 at 1:04

GoogleCodeExporter commented 9 years ago
Thanks for this fix. It will be in RC2 (May).

Thankfully it was only 2 lines of code, so I'm not making you sign a contributor
agreement for it. :)

Original comment by kevin...@gmail.com on 24 Apr 2009 at 4:29

GoogleCodeExporter commented 9 years ago
I had a look at this again. Some points:

- current implementation is cubic (!) In particular, in the worst case, 
iterating N elements from a recursive concat() can cause roughly:

(N-1)*N*(N+1)/6 + N^2 + N-2, i.e. O(N^3)

invocations to hasNext() of the Iterators#concat implementation (let alone the 
necessary invocations to hasNext() of the actual iterator inputs). This grows 
fast.

- in the perfectly balanced case, this is O(N*(logN)^2), which is not terrible, 
though it's unlikely for anyone to ever constructed one of those (the usual 
case is "push, via a concat(), an element to the front to the back", 
repeatedly).

- This can be reduced to a linear number of hasNext() invocations: 2N + 1, i.e. 
O(N)

But concat() iterators become a little fatter (an extra, constant amount of 
memory is needed).

So, there is a trade-off:
- From O(N^3) down to O(N) complexity reduction
- O(1) extra memory per concat(). 

Should I pursue this? Is it a trade-off we're willing to make? To me, it looks 
very one-sided to discard it as not worthwhile, but it will take some effort to 
reduce the extra footprint (a stack is needed).

Original comment by jim.andreou on 13 Apr 2012 at 1:17