scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
305 stars 27 forks source link

group-by result with the same group names #30

Open genmeblog opened 3 years ago

genmeblog commented 3 years ago

grouped dataset usually has unique group names, however it can contain names duplicated (what is unique it's group-id). So there is an option to build such grouping to make partition-by functionality happen, as described here: https://clojurians.zulipchat.com/#narrow/stream/264992-ml-study/topic/session.2013.2E1/near/228939222

Possible direction of development can allow seqences of indexes in case of grouping by map. Currently you can build grouped dataset out of a map with keys as a group names and vals as list of row indices. If we allow seq of row indices too it could solve this issue.

Contributor should verify against regresion all functions relying on group-name especially: ungroup and aggregate. Also pivot operations.