Fold/Unfold revisited - Githubissues

danieldietrich commented 7 years ago

Folds (catamorphisms) are dual to unfolds (anamorphisms).

I want to perform the following changes:

I want to improve the Javadoc of fold, foldLeft and foldRight. Fold needs a neutral element as zero and an associative combine function. Also examples would be helpful. Folds are the most important operations on collections in functional programmin.
We should swap the generics T, U in unfoldLeft and unfoldRight in order to reflect the dual nature to foldLeft and foldRight. A dual function is derived by inverting the arrow directions.
Currently unfold relies on Option, which eats mem. I want to add unfold operations that use Predicates and PartialFunction.
We should add unfold0 as special case of unfold. I think introducing a fold0 makes no sense because it it would reduce a non-empty collection to the last element or return zero in the empty case.

Details:

// ad 2) swapping generics
static <T, U> List<T> unfoldLeft(U seed, Function<? super U, Option<Tuple2<? extends U, ? extends T>>> f) { ... }

static <T, U> List<T> unfoldRight(U seed, Function<? super U, Option<Tuple2<? extends T, ? extends U>>> f) { ... }

// ad 3) new unfold variants (shown exemplary for unfoldLeft only)
// I think we do not need unfoldLeft(Object, Predicate, Function),
// instead we add a PartialFunction.of(Predicate, Function)
static <T, U> List<T> unfoldLeft(U seed, PartialFunction<? super U, Tuple2<? extends U, ? extends T>> f) { ... }

// ad 4) unfold0
static <T> List<T> unfold0(T seed, PartialFunction<? super T, ? extends T> f) {
    List<T> result = empty();
    for(T t = seed; f.isDefinedAt(t); t = f.apply(t)) {
        result = result.prepend(t);
    }
    return result; // .reverse() ?
}

smillies commented 7 years ago

I see an additional problem with unfoldRight/Left that you might fix in this context: The documentation says:

The function should return None when it's done generating the Vector, otherwise Some Tuple of the value to add to the resulting Vector and the element for the next call.

However, the examples use the tuple exactly the other way around: it is the first element that controls the recursion and the second that gets added. I believe that is a bug, the documentation says how it should be - at least when you take established standards from other languages such as Haskell as the benchmark.

(I do understand why it is more convenient in Java to have the seed first and the function second - although other than in Haskell - but I can see no reason for the different interpretation of the tuple.)

smillies commented 7 years ago

There are a couple of points to keep in mind with that idea of having a predicate-based version of unfoldRight/Left.

It totally destroys the duality of the type signature to fold (which is already somewhat obfuscated by fold being an instance method and unfold being static, as well as relying on some type equivalences). This is more of an aesthetic concern, I think a note in the Javadocs would suffice.
You must fix a meaning for the predicate: either as a termination condition, or a condition to enter recursion. In your partial function formulation, the natural extension of the predicate would be the domain of the function, i. e. it would mean "go on". You should then perhaps change the examples for the existing methods, which use a "stop" condition to return Option.none(), so as to avoid confusion.
I think you would have to test the predicate on the last pair that is returned. If you don't have some "undefined" value as one of the Tuple elements, it might be difficult to put the last desired value into the list, or perhaps the predicates may have to be different (b > 0 with Optional, b >= 0 with Predicate). This can probably be handled, but will complicate the implementation. The nice thing about Option is that it can stand-in when there is no natural way to construct a "terminating tuple".

smillies commented 7 years ago

Here's another thought that makes a predicate-based version of unfoldRight questionable, in particular one where the predicate is hidden in a partial function. The point has to do with reusing existing functions in an unfold operation. Such pre-existing functions usually will not return a tuple. The code that constructs the pair for unfold will somehow be extra. In addition, many legitimate use cases for unfold decide how to proceed by looking at the result of the function, not just its previous input. In a predicate-based unfoldRight, this may require double computations of the function.

Consider the following (rather contrived) example. It's not Vavr, but my own toy example, but it should serve to illustrate the point. The output of unfoldRight in this example is supposed to be the one-element list ["even"].

Function<Integer,String> expensive = i -> i % 2 == 0 ? "even" : "odd"; // long-running pre-existing function
Function<String,Pair<String,Integer>> nextTuple = s -> "odd".equals(s) ? pair(s,2) : pair(s,1);  // cons the string in the result, use 2 or 1 as next input
Predicate<String> continueWhile = s -> "even".equals(s); // examine result of long-running function
// repeated computation of expensive function with predicate-based version
unfoldrP(nextTuple.compose(expensive), i -> continueWhile.test(expensive.apply(i)), 10);

With a utility function, you can convert the Predicate to an Optional, so that the result of the function can be shared:

Function<Integer, Optional<Pair<String, Integer>>> predicateToOption(Predicate<String> pred, Function<Integer, Pair<String, Integer>> f);

// single computation of expensive function with option-based version
unfoldr(predicateToOption(continueWhile, nextTuple.compose(expensive)), 10);

I think instead of having the predicate-based unfoldr in Vavr, it would be more useful to have such a utility function, along the lines of

static <T, R> Function<T, Optional<Pair<R, T>>> predicateToOption(Predicate<R> pred, Function<T, Pair<R, T>> f) {
        return x -> {
            Pair<R, T> p = f.apply(x);
            return pred.test(p.first()) ? Optional.of(p) : Optional.empty();
        };
    }

johnw42 commented 7 years ago

Who wants commentary from random strangers? You do! I hope so, anyway.

Defining an unfold operation in terms of the PartialFunction type seems questionable to me, because at the very least, it replaces one virtual function call with two. I guess it's all right as an alternative, but I suspect it wouldn't see a lot of use, because it will be hard to implement PartialFunction without either duplicating a bunch of logic in both apply() and isDefined(), or doing something gross like memoizing.

I think the real fix (which I hope it's on Oracle's roadmap) would be to optimize the JVM so returning an Optional instance doesn't allocate any memory. (Or better yet, make it so returning a new instance of any final class with a small number of members doesn't allocate memory, but I won't hold my breath on that).

Have you done any benchmarking to see what the actual impact is of returning Optional instances? It seems like a textbook example of the kind of problem generational garbage collectors were invented to solve.

danieldietrich commented 7 years ago

@johnw42 thank you, that‘s valuable input - I will take it into account!

ATM I currently have not benchmarked that specific case but value types (flattening mem refs) will definitely make a difference on the JVM.

vavr-io / vavr

Fold/Unfold revisited #2062