google / guava

Google core libraries for Java
Apache License 2.0
50.03k stars 10.86k forks source link

Add Lists.partitionBy to split a list into a list of lists #3136

Open pascalgn opened 6 years ago

pascalgn commented 6 years ago

Sometimes you wish to iterate over a list of items in a grouped fashion:

List<Project> trending = Arrays.asList(new Project("guava", "java"),
        new Project("spring", "java"), new Project("node", "js"));

Project current = null;
for (Project p : trending) {
    if (current == null) {
        current = p;
    } else if (!current.getLang().equals(p.getLang())) {
        System.out.println("---");
        current = p;
    }
    System.out.println(p.getName());
}

There already is Lists.partition, but in this case the sub lists might have different sizes (and we don't know these sizes).

It would be nice to have a standard way to split the list, with only needing to specify the difference between the items:

List<List<Project>> grouped = Lists.partitionBy(trending,
        (p1, p2) -> p1.getLang().equals(p2.getLang()));
for (int i = 0; i < grouped.size(); i++) {
    if (i > 0) {
        System.out.println("---");
    }
    for (Project p : grouped.get(i)) {
        System.out.println(p.getName());
    }
}

Naming

There already is Lists.partition and as that also returns List<List<T>>, I would suggest a similar name. Possible alternatives would be splitBy or groupBy. My suggestion would be partitionBy.

Related

Summary

Add new method List<List<T>> Lists.partitionBy(List<T>, BiPredicate<T, T>).

I would be happy to implement it and if there is agreement on this, I can create a PR. Thanks for reading, and if you have any questions, please ask!

jbduncan commented 6 years ago

Hi @pascalgn, I can't speak for anyone else, but I'm not totally clear as to what you expect the output of this new method to be.

I understand that you want to take a list and for something to happen if two adjacent elements are different. Am I right to understand that this something you want to happen is to partition a list into sublists, where each sublist is a consecutive, iteration-ordered sequence of elements from the original list that are "equal" by some measure?

As in, if we had Lists::partitionBy as you described it above, and we had the following method call:

List<String> strings = ImmutableList.of("hello", "HELLO", "goodbye", "hello");
List<List<String>> subLists =
    Lists.partitionBy(strings, (a, b) -> a.equalsIgnoreCase(b));

would you expect the output to look like this?

System.out.println(subLists); // [["hello", "HELLO"], ["goodbye"], ["hello"]]

Or do elements not need to be consecutive to be in the same sublist, in which case the output would maybe look like this?

System.out.println(subLists); // [["hello", "HELLO", "hello"], ["goodbye"]]
pascalgn commented 6 years ago

Yeah, good question, I forgot writing about that. In my opinion, it should be like your first output, so

List<String> strings = ImmutableList.of("hello", "HELLO", "goodbye", "hello");
List<List<String>> subLists =
    Lists.partitionBy(strings, (a, b) -> a.equalsIgnoreCase(b));
System.out.println(subLists); // [["hello", "HELLO"], ["goodbye"], ["hello"]]

That way, any existing order of the input list will not be changed, so it really is just a partitioning: Iterating over the original list would be the same as iterating over each returned sub-list (in order).

The main rationales here are that (1) it kind of mimics the existing Lists.partition and (2) if you wanted to have a "true" grouping, you could use a stream together with Collectors.groupingBy.

jbduncan commented 6 years ago

Cool, thanks for the clarification. :)

I don't know if this proposal will be accepted, as I'm not on the Guava team myself (or a Googler, for that matter).

But if it were to be considered by the Guava team, I wonder if there would be a better name than List.partitionBy, due to the confusion it causes (in my mind, at least) with Collectors.partitioningBy, which as you described above does something a bit different.

swankjesse commented 6 years ago

Is this what you want? http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/collect/Multimaps.html#index-java.lang.Iterable-com.google.common.base.Function-

pascalgn commented 6 years ago

Thanks for the pointer, I have not considered that! However, Multimaps.index returns a map, so it's very similar to Collectors.groupingBy:

var result = Multimaps.index(Arrays.asList("Inky", "Blinky", "Pinky", "Pinky", "Clyde"),
        String::length);
System.out.println(result);  // {4=[Inky], 6=[Blinky], 5=[Pinky, Pinky, Clyde]}

Sometimes it might be easier to determine where a list should be split ("no, these two items are different!") than to assign a suitable key to each item. Additionally, the proposed solution would support cases where you would not want items matched to an "old" group when there are groups in between:

List<List<String>> grouped = Lists.partitionBy(
        Arrays.asList("Pinky", "Blinky", "Pinky", "Pinky"),
        (a, b) -> a.length() == b.length());
System.out.println(grouped);  // [[Pinky], [Blinky], [Pinky, Clyde]]
ben-manes commented 6 years ago

You might consider jOOL as a better fit, since there was already an issue to support Guava's partition and make it more flexible. That utility library is going towards broad coverage of functional streams, whereas Guava usually has a high bar and a wider scope.

pascalgn commented 6 years ago

Thanks for the pointer, I'll also have a look at the issue there!

Regarding the name I'm not so sure anymore. Lists.partition returns a view of the list, which makes sense in a way, because modifying elements has no effect on which group they belong to:

List<String> items = Arrays.asList("a", "b", "c");
List<List<String>> partition = Lists.partition(items, 1);
items.set(1, "X");
System.out.println(partition); // [["a"], ["X"], ["c"]]

The same handling would be weird for the proposed new function, because changing elements of the original collection could lead to modified elements being in the "wrong" partition/sublist, so I think the new partitionBy should maybe not return a view but instead create a completely new list with new sublists?

This difference from partition should maybe also be indicated by a different name, for example splitBy or splitOn.

Maaartinus commented 6 years ago

FWIW it's like string splitting on a zero-width match like \\b (the corresponding BiPredicate would be something like (a, b) -> Character.isAlphanumerical(a) != Character.isAlphanumerical(b)).

pascalgn commented 4 years ago

I received some cautious 👍and no 👎, so I created a PR for it. If you have some time, please have a look (it's small!) and tell me what you think!