hrldcpr / pcollections

A Persistent Java Collections Library
Other
765 stars 78 forks source link

Provide Stream collectors for persistent collections. #88

Open bowbahdoe opened 4 years ago

bowbahdoe commented 4 years ago

One of the major differences between pcollections and other persistent collections libraries like vavr is the desire to remain interoperable with the standard java collections framework. However, because persistent collections do not allow for mutation, we miss out on a good chunk of the standard library's methods for working with collections.

As an example, PVector implements java.util.List but does not support any of the mutating methods on that interface. For some there are natural replacements - add, addAll, remove and removeAll are supplanted by plus, plusAll, minus, and minusAll. For others though, there is no natural replacement: sort, retainAll, and clear have no direct analogue.

This contrasts with classes like vavr's Vector which are fairly full featured in terms of the methods they support on account of not needing to maintain this compatibility.

io.vavr.Vector.of(1, -1, 2, 3).filter(x -> x <= 2).sorted();

The analogue with regular java collections pre-java 8 would be

List<Integer> xs = new ArrayList<>();
xs.add(1);
xs.add(-1);
xs.add(2);
xs.add(3);
xs.retainAll(x -> x <= 2);
xs.sort();

But with java 8, we get the streams api, which lets us do those larger functional transformations in the same manner as vavr albeit via explicitly targeting a collection type as the end result..

List.of(1, -1, 2, 3)
  .stream()
  .filter(x -> x <= 2)
  .sorted()
  .collect(Collectors.toList());

Which is all a long winded way of saying, it would make a lot of sense to provide collector implementations for targeting persistent collections.

It matches up with the standard java way of doing things and would be a decent usability win. The main downside is that to do this would require a major version bump to make Java 8 the minimum and maybe working out some of the other issues with regards to performance. Specifically, if there is ever added a notion of transients or some other kind of mutable intermediate representation it would be transparent to swap the implementation of a collector provided by the library, perhaps without exposing that functionality until it is fully designed or maybe never.

public interface PVector<E> extends PSequence<E> {
    public static <T> Collector<T, ?, PVector<T>> collector() {
        return Collector.of(Empty::pvector, PVector::plus, PVector::plusAll);
    }
    // ...
}

TreePVector.from(List.of(1, -1, 2, 3))
  .stream()
  .filter(x -> x <= 2)
  .sorted()
  .collect(PVector.collector());

There would be other benefits to moving to Java 8 as well - chaos and major version bumps are ladders. Perhaps a PVector.empty() could supplant Empty.pvector() or similar api improvements and deprecations.

hrldcpr commented 2 years ago

We've upgraded to Java 8 now 🚀 e08ad47

mwilliamson commented 1 year ago

I knocked something together for my own use that I think does the trick for PVector. Specifically, it creates a mutable builder class for vectors. I had a couple of questions (although other thoughts are welcome!):

  1. Are there any existing mutable builder classes available? I had a quick look through the code and issues, which would appear to suggest not, hence defining my own.

  2. The implementation below uses PVector as the intermediate representation. Would it be more efficient to use a normal Java list as the intermediate representation, and then only to convert that into a PVector as the finishing step?

import org.pcollections.Empty;
import org.pcollections.PVector;

import java.util.stream.Collector;

public class PCollectors {
    private PCollectors() {
    }

    public static <T> Collector<T, PVectorCollector<T>, PVector<T>> toVector() {
        return Collector.of(
            PVectorCollector::new,
            PVectorCollector::add,
            PVectorCollector::addAll,
            PVectorCollector::toVector
        );
    }

    private static class PVectorCollector<T> {
        private PVector<T> vector = Empty.vector();

        void add(T other) {
            vector = vector.plus(other);
        }

        PVectorCollector<T> addAll(PVectorCollector<T> other) {
            vector = vector.plusAll(other.vector);
            return this;
        }

        PVector<T> toVector() {
            return vector;
        }
    }
}
bowbahdoe commented 1 year ago

In the generics, exposing the intermediate as ? would be appropriate - especially when that class is private.

public static <T> Collector<T, ?, PVector<T>> toVector() {

Which lets us separate this sort of question

Would it be more efficient to use a normal Java list as the intermediate representation

From API questions like - should this live in a PCollectors sort of thing or on the interfaces themselves?