google / guava

Google core libraries for Java
Apache License 2.0
50.2k stars 10.91k forks source link

Possible optimization of FluentIterable.transform #1413

Open gissuebot opened 10 years ago

gissuebot commented 10 years ago

Original issue created by Eric6iese on 2013-05-14 at 06:53 PM


FluentIterable.transform delegates its internal iterable to the Iterables.transform method. This method has the disadvantage that it wraps any kind of collection as an iterable. So when the following is invoked:

FluentIterable.from(someList).transform(someFunction).toList()

ImmutableList.copyOf(iterable) falls back to less efficient creation by iteration - instead of creating a fixed-size array as it would do if the transformed internal iterable was still a collection.

This would easily be changed if FluentIterable.transform just checks if its a Collection first and then invokes FluentIterable.from(Collections2.transform(iterable, function).

This optimization applies only to FluentIterables which are just transformed and not filtered. But its a minimalistic change which should not have any real disadvantages except one more line of code and allows for more efficient creation of the resulting datastructures.

gissuebot commented 10 years ago

Original comment posted by Eric6iese on 2013-05-14 at 07:31 PM


It might also be worthwhile to check for List and then use Lists.transform. In the common case of a RandomAccess List, toArray will skip using the iterator.

gissuebot commented 10 years ago

Original comment posted by Eric6iese on 2013-05-15 at 09:01 AM


"In the common case of a RandomAccess List, toArray will skip using the iterator."

Well, that one does not work out, since AbstractList does not optimize toArray in the case of RandomAcessLists. Therefore, just checking for collection should be sufficient.

gissuebot commented 10 years ago

Original comment posted by schlosna on 2013-10-12 at 06:13 AM


I know there is API documentation for Iterables#transform and Iterables#partition, but I have run into several places where this optimization would be useful to avoid copying where I have an API that accepts Iterable and internally apply Iterables.transform, but is frequently given a collection (often an Immutable*).

It might also be beneficial to have Iterables.transform dynamically dispatch transform to the most appropriate and optimized implementation for the specified iterable:

public static <F, T> Iterable<T> transform(final Iterable<F> fromIterable,
                                           final Function<? super F, ? extends T> function) {
    checkNotNull(fromIterable);
    checkNotNull(function);
    if (fromIterable instanceof List) {
        return Lists.transform((List<F>) fromIterable, function);
    } else if (fromIterable instanceof Collection) {
        return Iterables2.cast(Collections2.transform((Collection<F>) fromIterable, function));
    } else {
        return Iterables.transform(fromIterable, function);
    }
};
gissuebot commented 10 years ago

Original comment posted by cpovirk@google.com on 2013-10-16 at 07:05 PM


There was some previous discussion of this in http://code.google.com/p/google-collections/issues/detail?id=207 ... but that focused on Iterables.size, not ImmutableList.copyOf and FluentIterable.toList, more commonly used methods that I find more compelling.

Note that this request is (I think) distinct from changing Iterables.transform itself (as described in comment #3), as the FluentIterable would remain a "plain" Iterable and not a Collection or List, so there is no change to its equals() behavior. Here's the (irrelevant here, again, I think) comment that Kevin once made about changing Iterables.transform (not FluentIterable.transform) to delegate to Lists.transform when appropriate:

""" So, right now every method of Iterables that returns an Iterable consistently returns a "plain Iterable", and every method of Collections2 returning a Collection returns a "plain Collection." I would prefer not to step away from that without having a clearer picture of where the ultimate destination we want to end up at is... and clear evidence that this is all definitely worthwhile. """

Also, I've split the AbstractList.toArray issue off into https://github.com/google/guava/issues/1556


Labels: Type-Performance, Package-Collect

gissuebot commented 10 years ago

Original comment posted by Eric6iese on 2013-10-17 at 10:41 AM


I'd strongly agree to change only FluentIterable and NOT Iterables, as the latter clearly suggests to return an Iterable and nothing else. Type-Dispatch as a means of internal optimization should not pollute the expected result of a method.

FluentIterable, however, keeps the base collection on the inside (brilliant design choice!), so here a full dispatch for Collection and List can be done without changing anything on the outside.

Obviously I'm also in favor for the other ticket.

gissuebot commented 10 years ago

Original comment posted by Maaartinus on 2013-10-17 at 03:33 PM


I'd say that the semantics should not change at all; the only user-visible change should be the speed in case the underlying Iterable happens to be a Collection. This seems to be doable for many methods including size, toArray, and toList. As Iterables uses FluentIterable internally, it should profit from this too. I've got some proof of concept already and will post it when tested well.

gissuebot commented 10 years ago

Original comment posted by Maaartinus on 2013-10-20 at 01:36 AM


I added a method using the size of the underlying iterable to the result of Iterables.unmodifiable, .transform, .skip, and .limit. In case they're based on a Collection (or another such size-aware Iterable), Iterables.size is nearly as fast as size of the underlying collection. This also allows to avoid resizing in toArray and toList.

The speed up is nice:

https://microbenchmarks.appspot.com/runs/b0538106-18fa-46ea-a8b4-18424a4707b8,4849c48d-4403-46e0-b895-2b000d194f74#r:scenario.benchmarkSpec.parameters.iterableSetup,scenario.benchmarkSpec.parameters.baseSetup,scenario.benchmarkSpec.parameters.state&c:scenario.benchmarkSpec.methodName

gissuebot commented 10 years ago

Original comment posted by Eric6iese on 2013-10-20 at 04:32 AM


Wouldn't it be simpler to put the size implementation of Iterables into the FluentIterable and implement Iterables.size by FluentIterable.from(iterable).size()?

That could be done with most of the iterables methods. The advantage of re-routing most implementations through fluentiterable is that it automatically checks for fluentiterable and "unpacks" the target collection whenever needed by the next method. However, the additional class-checks might make this optimization slower in many usecases: All optimizations are worthless as soon as the first filtering Predicate is used in the chain, which enforces the use of an iterable. And a filtering might be the most common operation of them all.

PhilippWendler commented 9 years ago

Is there a plan on implementing this? I think changing FluentIterable to do the Collection optimization is a safe and worthwhile change, and does not need to be held up by discussions about further reorganizations.

Would a pull request be useful to get this forward? I am not sure because it's such a small change but I could probably create one if wanted.