Add a fold method for Iterables

GoogleCodeExporter commented 9 years ago

I find myself always wanting to iterate over a collection to aggregate the
data and wishing Google Collections had a fold operation.

An implementation can be found on StackOverflow,
<http://stackoverflow.com/questions/950751/how-to-implement-a-list-fold-in-java/
951337#951337>.

Original issue reported on code.google.com by rwallace...@gmail.com on 12 Aug 2009 at 11:37

GoogleCodeExporter commented 9 years ago

A possibly easier implementation is to create a new FoldFunction type.  And 
implement
foldLeft as 

    static <A, B> A foldLeft(Iterable<B> xs, A z, FoldFunction<A, B> f)
    {
        A p = z;
        for (B x : xs)
        {
            p = f.apply(p, x);
        }
        return p;
    }

    interface FoldFunction<A, B>
    {
        A apply(A a, B b);
    }

Original comment by rwallace...@gmail.com on 13 Aug 2009 at 12:07

GoogleCodeExporter commented 9 years ago

What about a compress() function? (Maybe there's a better name.) compress() 
would
have to return null in the case of failure. Like this:

    static <T> Iterable<T> compress(Iterable<T> iterable, CompressFunction<T> f) {
        Iterator<T> i = iterable.iterator();

        if (!i.hasNext())
            return Collections.emptyList();

        T first = i.next();
        if (!i.hasNext())
            return ImmutableList.of(first);

        Stack<T> compressed = new Stack<T>();
        compressed.push(first);

        while (i.hasNext()) {
            T t = i.next();
            T c = f.compress(compressed.peek(), t);

            if (c != null) {
                compressed.pop();
                compressed.push(c);
            } else {
                compressed.push(t);
            }
        }

        return ImmutableList.copyOf(compressed);
    }

    interface CompressFunction<T> {
        T compress(T t1, T t2);
    }

    class Range {
        int from, to;
    }

    class CompressRanges implements CompressFunction<Range> {

        public Range compress(Range r1, Range r2) {
            if (r1.from > r2.to || r1.to < r2.from)
                return null;

            Range r = new Range();
            r.from = Math.min(r1.from, r2.from);
            r.to = Math.min(r1.to, r2.to);
            return r;
        }

    }

I'd like to hear your input on this. Do my random thoughts make sense to 
anybody?

Original comment by andre.ru...@gmail.com on 13 Aug 2009 at 8:52

GoogleCodeExporter commented 9 years ago

I prefer the term reduce, and i'd also like to see it in the library.

{{{
/** (a, b, c, d) -> f(f(f(a, b), c), d) */
public static <T> T reduce(final Iterable<? extends T> gen, final Function2<? 
super 
T, ? super T, ? extends T> f) {
    final Iterator<? extends T> it = gen.iterator();
    if (!it.hasNext()) {
        return null;
    }
    T result = it.next();
    while (it.hasNext()) {
        result = f.apply(result, it.next());
    }
    return result;
}

/** (a, b, c, d), initial -> f(f(f(f(initial,a), b), c), d) */
public static <X,Y> X reduce(final Iterable<? extends Y> gen, final X initial, 
final 
Function2<? super X, ? super Y, ? extends X> function) {
    final Iterator<? extends Y> it = gen.iterator();
    if (!it.hasNext()) {
        return initial;
    }
    X acc = initial;
    while (it.hasNext()) {
        acc = function.apply(acc, it.next());
    }
    return acc;
}
}}}

Original comment by jvdne...@gmail.com on 14 Aug 2009 at 9:42

GoogleCodeExporter commented 9 years ago

This requires the inclusion of a two-argument functor interface. I've seen 
negative 
reactions on previous requests so I'm thinking there's little chance of this 
one 
getting accepted. The fact that it keeps coming back accounts for something 
though.

Original comment by jvdne...@gmail.com on 18 Aug 2009 at 8:42

GoogleCodeExporter commented 9 years ago

I am not specifically declaring this to be out of scope, but it's important to 
realize 
that this kind of functional programming stuff has never been our core focus.  
The 
main reason we have Predicate at all is just that it's hard to filter an 
iterator by 
hand, and a filter() library method needs something like that, so there it is. 
The 
main reason we have Function is that MapMaker needs it.

Original comment by kevin...@gmail.com on 18 Aug 2009 at 9:51

GoogleCodeExporter commented 9 years ago

(remember that "Accepted" != "promise that we will do it")

Original comment by kevin...@gmail.com on 17 Sep 2009 at 4:59

Changed state: Accepted
Added labels: post-1.0

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:45

Added labels: Milestone-Post1.0
Removed labels: post-1.0

GoogleCodeExporter commented 9 years ago

Original comment by kevin...@gmail.com on 17 Sep 2009 at 5:57

Added labels: Type-Enhancement

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 30 Jul 2010 at 3:53

Removed labels: Milestone-Post1.0

GoogleCodeExporter commented 9 years ago

A two arg version of Function isn't necessary.  Just some sort of tuple type, 
let's say Pair<A, B>.  So Rich's implementation reduces to:

static <A, B> A foldLeft(Iterable<B> xs, A z, Function<Pair<A, B>, A> f)
    {
        A p = z;
        for (B x : xs)
        {
            p = f.apply(new Pair(p, x));
        }
        return p;
    }

Original comment by dan.ro...@gmail.com on 9 Oct 2010 at 12:17

GoogleCodeExporter commented 9 years ago

Issue 446 has been merged into this issue.

Original comment by cgdec...@gmail.com on 8 Apr 2011 at 3:30

GoogleCodeExporter commented 9 years ago

Creating a Pair type and using it like that just makes the semantics of the 
function less clear and forces lots of Pair objects to be created when they 
needn't be.

Original comment by cgdec...@gmail.com on 8 Apr 2011 at 4:48

GoogleCodeExporter commented 9 years ago

I have written Iterable around AbstractLinkedIterator with static factory 
method called unfold which tkes first element and a Function to compute the 
next element. 
Then I added static factory methods first() and rest() for Functions 
decomposing Iterables using Iterables.getFirst() and Iterables.skip().
Finally fold look as simply as this:

    public static final <E, A> A fold(final Iterable<E> iterable,
            final Function<E, A> accumulator) {
        final Function<Iterable<E>, Iterable<E>> rest = rest();
        final Function<Iterable<E>, E> first = first();
        final Iterable<E> generator = Iterables.transform(unfold(iterable, rest),
                first);
        return Iterables.getLast(Iterables.transform(generator, accumulator));
    }

The accumulator is simply a function - guava javadocs allow for functions with 
side effects, so it is possible to store the state of folding in the function 
itself. This is a trade-off between purity and problems with tuples in Java, 
resp. creating custom classes just for the accumulated value.

Example:

    @Test
    public final void sum() {
        final Function<Integer, Integer> sum = new Function<Integer, Integer>() {
            private Integer sum = 0;

            @Override
            public Integer apply(final Integer input) {
                sum += input;
                return sum;
            }
        };
        assertThat(fold(limit(unfold(1, successor), 100), sum), is(5050));
    }

If you find this approach useful, just let me know, I will polish the source 
and I will publish it.

Original comment by gscerbak@gmail.com on 10 May 2011 at 11:39

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 13 Jul 2011 at 6:18

Changed state: Triaged

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 10 Dec 2011 at 3:40

Added labels: Package-Collect

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 16 Feb 2012 at 7:17

Changed state: Acknowledged

GoogleCodeExporter commented 9 years ago

In my own project I implemented folding as follows:

  interface Reducer<A, B> {
    A reduce(A a, B b);
  }

  class Iterables {
    static <A, B> A reduce(Iterable<B> xs, A z, Reducer<A, B> reducer);
  }

  class BigDecimals {
    static Reducer<BigDecimal, BigDecimal> sum();
  }

With these in place I could reduce a list to a sum as follows:

  Iterable<BigDecimal> lineTotals = ...
  BigDecimal grandTotal = Iterables.reduce(lineTotals, BigDecimals.sum());

I hope you guys will consider adding this feature to Guava; with this one 
exception, Guava has always seemed to have exactly what I need for data 
processing.

If there is interest from the project maintainers in adding this feature, I'd 
be delighted to do the work and submit a patchset.

Original comment by qualidaf...@gmail.com on 9 Apr 2012 at 5:33

GoogleCodeExporter commented 9 years ago

FYI, I think we're astronomically more concerned about the API design, the 
potential for misuse, and our already existing concern over the readability of 
the functional idioms we provide (pending Java 8), than the difficulty of 
adding the feature once an API is endorsed, but we'll definitely update this 
issue if we decide to pursue folds and the like.

Original comment by wasserman.louis on 9 Apr 2012 at 7:17

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 30 May 2012 at 7:43

Added labels: Type-Addition
Removed labels: Type-Enhancement

GoogleCodeExporter commented 9 years ago

Original comment by kevinb@google.com on 22 Jun 2012 at 6:16

Changed state: Research

GoogleCodeExporter commented 9 years ago

I recently found myself needing the same type of functionality for calculating 
basic statistical functions on lists of numbers.  Since the interest within the 
project seems limited, has anyone given any thought to making this a separate 
project?

Original comment by phidia...@gmail.com on 9 Jul 2012 at 8:21

GoogleCodeExporter commented 9 years ago

This certainly exists in other projects (functional-java is an example), 
but...yeah.

(I have to admit, I still don't see what advantage this sort of thing would 
gain over a traditional for-loop.)

Original comment by wasserman.louis on 9 Jul 2012 at 8:46

GoogleCodeExporter commented 9 years ago

In my case I want to be able to do something like this:

Double sum = List.reduce(List<Double> values, SumReduction summation)

Additional classes for StandardDeviation, Mean, Median, Max, Min would also be 
useful.  Having the interfaces and a couple of simple implementations would 
make it easier for people to be able to implement more complex mathematical 
functions.

Original comment by phidia...@gmail.com on 9 Jul 2012 at 9:39

GoogleCodeExporter commented 9 years ago

...I'm still not seeing what possible advantage this would have over methods 
named e.g. sum(List<Double> values), mean(List<Double> values), 
median(List<Double> values), etc.  If you write those methods out, they're 
almost certainly *shorter* than "functional-style" reducers based on anonymous 
classes.  If you want to be able to plug different ones in, you're still 
probably better off with just a Function<List<Double>, Double>.

Furthermore, some of those operations *can't* be implemented as simple folds.

Original comment by wasserman.louis on 10 Jul 2012 at 1:04

GoogleCodeExporter commented 9 years ago

The problem with creating specific methods like sum, mean, etc is that it's 
always incomplete in someone's mind.  By creating an interface and a couple of 
simple implementations, it demonstrates how implementations will be used.  If 
someone wants to then spinoff an SPI project to implement a more complete set 
of functions, or to create libraries of different types of functions, they 
could do that.

Although the Function<List<Double>, Double> complies with the generic signature 
of the Function interface, it doesn't make sense from a practical sense.  As 
someone mentioned earlier, the intent of the Function interface is to perform 
an operation on a each object in a collection with the intent being that you 
could end up with a transformed copy of the list, or a transformed instance of 
the original list.  

If you accidentally tried to use one of these implementations (of 
Function<List<Double>, Double>) with Lists.transform, the code would compile, 
but you would have runtime errors when you ran it.  Therefore, I think a new 
interface with a different method signature would better communicate the intent 
of these types of aggregate functions. 

Could you give me some examples of  "operations [that] *can't* be implemented 
as simple folds"?  I'm not claiming that it's a panacea, nor doubting what you 
say, merely interested in what you mean.

Original comment by phidia...@gmail.com on 10 Jul 2012 at 3:53

GoogleCodeExporter commented 9 years ago

Standard deviation cannot be implemented as a simple fold, since you must first 
do one pass to determine the mean, then a second pass to accumulate the square 
of the difference from the mean.

Original comment by qualidaf...@gmail.com on 10 Jul 2012 at 8:46

GoogleCodeExporter commented 9 years ago

And median can't be implemented as a simple fold, since all the implementations 
I know of require either multiple passes, or at least O(n) extra storage, which 
kind of defeats the point.

I'm still not really getting the point, though.  If you want to write a project 
to implement more functions...you write a library with a bunch of static 
methods: foo(List<Double>), or whatever.  (I also vaguely recall that Guava's 
been working on some statistics utilities, which we might see sometime in the 
future...)

In any event, the only advantage I can think of to making these objects, 
instead of e.g. static methods, is being able to pass them around 
polymorphically and then apply them...which is a need satisfied by 
Function<List<Double>, Double>.  I'm not suggesting anyone use Lists.transform 
with such a function, but I am saying that you can pass them around 
polymorphically and call their apply() method, which seems to me like it covers 
the one advantage objects would have over functions.

Original comment by wasserman.louis on 10 Jul 2012 at 10:44

GoogleCodeExporter commented 9 years ago

<pedantry>Standard deviation could be implemented with a fold, although the 
accumulator would need to be accumulating both sum(x_i) and 
sum(x_i)^2.</pedantry>
That said, I agree with Louis - for loops aren't that bad, and they perform 
well.

Original comment by ian.b.ro...@gmail.com on 24 Jul 2012 at 5:09

GoogleCodeExporter commented 9 years ago

>That said, I agree with Louis - for loops aren't that bad, and they perform 
well.
Assembler code isn't that bad, and it performs well. Does it mean that we don't 
need C and Java? 'fold' is a standard idiom for processing lists, after all.

Original comment by byt...@gmail.com on 30 Aug 2013 at 8:29

GoogleCodeExporter commented 9 years ago

+1 for #29 above.

The nicest thing about anything that takes closures is that they translate well 
to the function composition world, that we're already seeing partly with Guava, 
and will be even more when Java 8 Lambda comes along (syntax sugar to write the 
FoldFunction). Project Sumatra and proposed Java 9 parallelism also uses this.

Original comment by ceefour666@gmail.com on 30 Aug 2013 at 8:45

GoogleCodeExporter commented 9 years ago

Guys, introducing fold without adding other high order functions like 
skipWhile, takeWhile, reduce, groupBy, flatMap, indexWhere and so on is not the 
best idea, I think. Guava is not an FP library. If you want fold, use 
totallylazy. Or Scala.

Original comment by orionllm...@gmail.com on 30 Aug 2013 at 8:57

GoogleCodeExporter commented 9 years ago

-> #31
So, you see these as FP only, despite their clear correlation to list 
processing? Guava has few tools for processing collections, why not add some 
more if people need them? Or it's like: "It's our library, we don't want this 
stuff. People use it and request some features? Fork 'em!"

Guava already has filter() and transform(). They're FP, they don't belong 
there. Or do they? I don't see any real reasons not to include some "FP" 
methods for collection processing. They reduce :) code, they are faster to 
write than beloved for-loops, and they look nicer in IntelliJ Idea even today, 
without Java 8 with it's nya-looking lambdas.

You suggest to use some other library or even Scala just for one or two missing 
functions? I don't want my pom.xml to become separate project, just because I 
use a library every time I need a method.

Finally, do Guava developers have any reasons to not to add fold() and stuff, 
other than ideological (because they think almost every list processing idiom 
is FP only)? "Open source" not only means that sources are open (pretty 
obvious, huh), but also means some cultural things - like sharing ideas, 
receiving requests and implementing some useful stuff, you know. Position like 
"I don't need that, so nobody does" seems kind of strange here.

Original comment by byt...@gmail.com on 31 Aug 2013 at 8:45

GoogleCodeExporter commented 9 years ago

I want only say that fold is definitely not enough for me. Why fold? Why not 
flatMap, maxBy or zip? You either add all of these methods, or you add nothing. 
Adding only fold looks ridiculous.

Original comment by orionllm...@gmail.com on 31 Aug 2013 at 4:24

GoogleCodeExporter commented 9 years ago

> Why fold? Why not flatMap, maxBy or zip?
 - flatMap - there's [FluentIterable#transformAndConcat][1],
 - maxBy - use [Ordering#onResultOf][2] + [Ordering#max][3],
 - zip - not in Guava - [requires Pair / Tuple][4] + see [this issue][5].

> I don't see any real reasons not to include some "FP" methods for collection 
processing.
They would require adding many things to Guava API (two-arg functions, pair, 
etc.) - see new [java.util.function package from Java 8][6] - but contrary to 
Java 8, functional idioms in Java 6/7 are horribly verbose and ugly. 

Although Guava could adopt some names / API from Project Lambda to "backport" 
FP goodies to Java 6/7, it would be painful to use fold without language 
support (i.e. without lambdas). And bringing FP idioms to Java as such is 
separate discussion, for example my collegue believes that average "enterprise" 
Java programmer will have troubles with lambdas and they are not necessary in 
the language where for-loop is a common and well known idiom. (FYI: I know / 
have been using higher order functions in Lisp / Perl and it's not the same 
experience for both code writer and reader in Java, at least without lambdas in 
core for which I am waiting impatiently.)

[1]: 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect
/FluentIterable.html#transformAndConcat(com.google.common.base.Function)
[2]: 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect
/Ordering.html#onResultOf(com.google.common.base.Function)
[3]: 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect
/Ordering.html#max(java.lang.Iterable)
[4]: 
http://code.google.com/p/guava-libraries/wiki/IdeaGraveyard#Tuples_for_n_>=_2
[5]: http://code.google.com/p/guava-libraries/issues/detail?id=677
[6]: 
http://download.java.net/jdk8/docs/api/java/util/function/package-summary.html

Original comment by xaerx...@gmail.com on 31 Aug 2013 at 9:17

GoogleCodeExporter commented 9 years ago

Issue 1601 has been merged into this issue.

Original comment by lowas...@google.com on 3 Dec 2013 at 7:56

GoogleCodeExporter commented 9 years ago

Issue 1601 has been merged into this issue.

Original comment by lowas...@google.com on 3 Dec 2013 at 7:56

GoogleCodeExporter commented 9 years ago

The following assumes sequential operation:

interface Collector<T,A,R> {
   A newContainer();
   void accumulate(A container, T element);
   R finish(A container);
}
Like Java 8 Collector, though functions of supplying and accumulating is pushed 
to the interface.

<T,A,R> R collect(Iterable<? extends T> elements, Collector<? super T, A, ? 
extends R> collector) {
   final A container = collector.newContainer();
   for (T each : elements)
      collector.accumulate(container,each);
   return collector.finish(container);
}
Typical mutable reduction implemention

class Collectors {

   public static <T, C extends Collection<T>> Collector<T,?,C> toCollection(Supplier<? extends C> factory) {
      // ...
   }

   // ...

}
Utility for collectors.

Original comment by jysjys1...@gmail.com on 4 Feb 2014 at 8:24

GoogleCodeExporter commented 9 years ago

I just use this arrangement:
public interface Foldable<A, R> {
    R fold(A left, R right);
    /**
     * reduce or fold the {@link Iterable} with an initial value and folding function from left to right
     * @param list
     * @param acc
     * @param f the Foldable (interface override)
     * @return
     */
    static <A, R> R foldRight(final Iterable<A> list, final R acc, final Foldable<A, R> f) {
        if (Iterables.isEmpty(list))
            return acc;
        return foldRight(Iterables.skip(list, 1), f.fold(Iterables.getFirst(list, (A)null), acc), f);
    }
    /**
     * reduce or fold the {@link Iterable} with an initial value and folding function from right to left
     *
     * @param list the iterable list or set etc.
     * @param acc initial value
     * @param f the Foldable (interface override)
     * @return the result of folding from right to left
     */
    static <A, R> R foldLeft(final Iterable<A> list, final R acc, final Foldable<A, R> f) {
        if (Iterables.isEmpty(list))
            return acc;
        return f.fold(Iterables.getFirst(list, (A)null), foldLeft(Iterables.skip(list, 1), acc, f));
    }
}

Original comment by Lord.Lar...@gmail.com on 26 Mar 2014 at 11:42

GoogleCodeExporter commented 9 years ago

The nested Iterables.skips will give extremely terrible performance, certainly 
quadratic if not worse.  You should really just be folding over the Iterator 
instead.

Original comment by wasserman.louis on 27 Mar 2014 at 1:45

GoogleCodeExporter commented 9 years ago

This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:16

Added labels: MigratedToGitHub

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:19

Changed state: Migrated

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:10

Added labels: Restrict-AddIssueComment-Commit

tailorlala / guava-libraries

Add a fold method for Iterables #218