Closed dom-devel closed 3 years ago
that's true, currently it only group by two column
@steveoni Did you fix this in the recent PR, or should we move this to future plans?
@risenW not fixed, it will require some refactoring to the groupby class, we can add it to the future plans
I had a look at it, and I think I can have a patch in a few days.
@sponsfreixes that will be great.
I plan of refactoring and structuring the groupby operation properly. but time factor.
you can go ahead and add your changes
I started working on it this last weekend, but forgot to post here ๐ . It's a pretty fun refactoring because to support any number of columns you need to use recursion to transverse nested dictionaries as trees. I'll make a PR once I'm done.
@sponsfreixes That's cool. I will be expecting your PR ๐
While working on it I think I discovered a bug (or either I'm misunderstanding something). I am getting duplicated rows on the following example:
const dfd = require("danfojs-node");
let data = { 'A': [ 'foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo' ],
'B': [ 'one', 'one', 'two', 'three',
'two', 'two', 'one', 'three' ],
'C': [ 1, 3, 2, 4, 5, 2, 6, 7 ],
'D': [ 3, 2, 4, 1, 5, 6, 7, 8 ] };
let df = new dfd.DataFrame(data);
let group_df = df.groupby([ "A" ]);
group_df.col(['C', 'D']).apply((x) => x.add(2)).print()
I get back:
โโโโโโคโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโคโโโโโโโโโโโโโโโโโโโโ
โ โ A โ C_apply โ D_apply โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 0 โ foo โ 3 โ 5 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 1 โ foo โ 4 โ 6 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 2 โ foo โ 7 โ 7 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 3 โ foo โ 8 โ 9 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 4 โ foo โ 9 โ 10 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 5 โ foo โ 3 โ 5 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 6 โ foo โ 4 โ 6 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 7 โ foo โ 7 โ 7 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 8 โ foo โ 8 โ 9 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 9 โ foo โ 9 โ 10 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 10 โ bar โ 5 โ 4 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 11 โ bar โ 6 โ 3 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 12 โ bar โ 4 โ 8 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 13 โ bar โ 5 โ 4 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 14 โ bar โ 6 โ 3 โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโข
โ 15 โ bar โ 4 โ 8 โ
โโโโโโงโโโโโโโโโโโโโโโโโโโโงโโโโโโโโโโโโโโโโโโโโงโโโโโโโโโโโโโโโโโโโโ
Row 0 is duplicated with row 5, row 1 with row 6, etc. Then for the "bar" group, the same happens with 10-13, 11-14, etc. Is that really the expected behavior?
@sponsfreixes that's true, I've fixed that in edd2e0c34d7002d8a38dd987b20f998bd95572b6 and its already merged. so pull the new update or check the commit and make adjustment to the version you have
@dom-devel you can now group by more than two columns in the new update
Describe the bug I'm a little unsure if this is a bug? Or some sort of un-documented restriction?
If you try to groupby with 3 columns, it only groups by the first. If you try to groupby with 2 columns, it only groups by both.
Version
To Reproduce
const dfGroup = df.groupby(["col1", "col2","col3"])
dfGroup.col(["numeric_col"]).sum()
Expected behavior I get a dataframe with aggregation by all 3 columns. Instead I'm getting a dataframe with aggregation by a single column.
Is this currently a restriction or is this a bug?