scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
277 stars 18 forks source link

Support for grouped ds in bind #44

Open ladymeyy opened 2 years ago

ladymeyy commented 2 years ago

bind function: https://github.com/scicloj/tablecloth/blob/411419f3edfd647c930a64a2a56888925eb4122f/src/tablecloth/api/join_concat_ds.clj#L121

bind doesn't support grouped ds as mentioned here: https://clojurians.zulipchat.com/#narrow/stream/151763-beginners/topic/tablecloth.3A.20adding.20sum.20to.20bottom.20of.20column/near/244847145

--I would be happy to provide a PR

ladymeyy commented 2 years ago

Implementation plan : https://docs.google.com/spreadsheets/d/1u6BMwTJLwRSiJyuEeWqMsGOdLqPoc9ce5UVcc1DDi2Y/edit?usp=sharing

genmeblog commented 2 years ago

Great. I'll take a look at this at the end of the week.

genmeblog commented 2 years ago

Ok! I'm ready for a discussion (we can move it to Zulip)

bind (along with all the joins and other functions combining two and more datasets) doesn't work on grouped dataset now. During initial implementation I left it since there are some deeper issues to solve. Here is the summary.

We have to consider 4 different input combinations (X - an operation, eg. bind, ds - regular dataset, gds - grouped dataset):

  1. (X left-ds right-ds)
  2. (X gds ds)
  3. (X ds gds)
  4. (X left-gds right-gds)

Ad.1 - this is implemented, function returns a new dataset

Ad.2. - this can be implemented the naive way, bind separatelly for each group using the same ds. Returns grouped dataset.

Ad.3. - I have no idea here :) The safest would be raise an exception. The second option is to ungroup gds and bind two regular datasets.

Ad.4. - The most complicated part. Possible option is: bind only matching groups, using left-gds as a main dataset. If there is no matching group in the right-gds, leave this group unchanged.

What do you think?

Note: this can apply for every function in join-concat namespace.