jtablesaw / tablesaw

Java dataframe and visualization library
https://jtablesaw.github.io/tablesaw/
Apache License 2.0
3.55k stars 643 forks source link

GroupBy with usage tech.tablesaw.api.Table#stream #1220

Open pavel-hp opened 1 year ago

pavel-hp commented 1 year ago

I got issue with applying groupBy operation on specific column with using tech.tablesaw.api.Table#stream API https://javadoc.io/doc/tech.tablesaw/tablesaw-core/latest/tech/tablesaw/api/Table.html#stream-- (Returns the rows in table as a Stream)

Basically group by doesn't work when I use stream from Table method. (I know that I can do "groupByColumn" differently but this is just an example for demo bug related to Tablesaw Stream API)

Tablesaw version: 0.43.1

Requires JDK 11

Here is Test for demo this issue:

import lombok.AllArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import tech.tablesaw.api.IntColumn;
import tech.tablesaw.api.Row;
import tech.tablesaw.api.StringColumn;
import tech.tablesaw.api.Table;

import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

@Slf4j
class TableSawGroupTest {
    static List<Holder> testData = List.of(
            new Holder("a", 1),
            new Holder("b", 2),
            new Holder("c", 3),
            new Holder("a", -1)
    );

    @Test
    void shouldGroupBy() {
        Map<String, Integer> tableSawRes = tableSawVersion();
        Map<String, Integer> javaRes = javaStreamVersion();
        Assertions.assertEquals(javaRes, tableSawRes);
    }

    Map<String, Integer> tableSawVersion() {
        StringColumn strColumn = StringColumn.create("A-str", testData.stream().map(p -> p.strValue).collect(Collectors.toList()));
        IntColumn intColumn = IntColumn.create("B-int", testData.stream().map(p -> p.intValue).toArray(Integer[]::new));
        Table table = Table.create(strColumn, intColumn);
        log.info("Table: {}", table.printAll());
        return table.stream()
            .collect(Collectors.groupingBy(p -> p.getString("A-str"), LinkedHashMap::new,
                Collectors.collectingAndThen(Collectors.toList(), rows -> {
                    int sum = 0;
                    for (Row row : rows) {
                        int bValue = row.getInt("B-int");
                        sum = sum + bValue;
                    }
                    return sum;
                })));
    }

    Map<String, Integer> javaStreamVersion() {
        return testData.stream()
            .collect(Collectors.groupingBy(p -> p.strValue, LinkedHashMap::new,
                Collectors.collectingAndThen(Collectors.toList(), rows -> {
                    int sum = 0;
                    for (Holder row : rows) {
                        int bValue = row.intValue;
                        sum = sum + bValue;
                    }
                    return sum;
                })));
    }

    @AllArgsConstructor
    private static class Holder {
        String strValue;
        int intValue;
    }
}

test failed, here is output:

2023-06-17 08:50:09 INFO  TableSawGroupTest:35 - Table:  A-str  |  B-int  |
-------------------
     a  |      1  |
     b  |      2  |
     c  |      3  |
     a  |     -1  |

org.opentest4j.AssertionFailedError: 
Expected :{a=0, b=2, c=3}
Actual   :{a=-2, b=-1, c=-1}

See tablesaw-test.zip