jtablesaw / tablesaw

Java dataframe and visualization library
https://jtablesaw.github.io/tablesaw/
Apache License 2.0
3.55k stars 643 forks source link

Combine multiple columns when aggregating #1214

Open JulienCarrau opened 1 year ago

JulienCarrau commented 1 year ago

I didn't find a way to do it while aggregating. Instead I first combine the columns I want and then I aggregate which is quite longer to code.

For instance if I have a table with a column containing items with prices and reductions in different countries and I want to group by item id. The aggregation I want is for instance a string join with the reduced price and the country.

| Item Id | Price | Reduction | Country | | 1 | 10 | 0.25 | India | | 1 | 20 | 0.1 | UK |

And I want to have the result as: | Item Id | Aggregated | 1 | 7.5 [India], 18 [UK]

So I'm looking for an aggregation method about to take multiple columns with different types and to combine all of them in one column.

Thanks in advance for your help, the library is really well done and very fast!

lwhite1 commented 1 year ago

There is nothing that can do all that in one step. You should probably

  1. do the calculation first, to create a reduction column.
  2. Then look at the melt and cast operations. Melt can convert rows to columns for some use cases.
  3. Then convert the reduction column to a string column using asStringColumn (or something like that), and finally
  4. Create a new column that combines that with the country name text using a string column operation. This may take an extra step if you want the country name in square brackets

On Wed, May 24, 2023 at 9:47 AM Julien Carrau @.***> wrote:

I didn't find a way to do it while aggregating. Instead I first combine the columns I want and then I aggregate which is quite longer to code.

For instance if I have a table with a column containing items with prices and reductions in different countries and I want to group by item id. The aggregation I want is for instance a string join with the reduced price and the country.

| Item Id | Price | Reduction | Country | | 1 | 10 | 0.25 | India | | 1 | 20 | 0.1 | UK |

And I want to have the result as: | Item Id | Aggregated | 1 | 7.5 [India], 18 [UK]

So I'm looking for an aggregation method about to take multiple columns with different types and to combine all of them in one column.

Thanks in advance for your help, the library is really well done and very fast!

— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1214, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAWRBNMYK4LDWEJ7JBTXHYGORANCNFSM6AAAAAAYNNIFQ4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JulienCarrau commented 1 year ago

Ok! Exactly like I did. Thank you for your answer, I was just wondering if I was doing it the right way :)