Open kitsunde opened 2 years ago
The current behaviour I think is fine. Pandas does give an dropna
option. It could be implemented, so the behaviour could be extended to this:
testDf.groupby(['group'], { dropna: true }).mean().print();
// ╔════════════╤═══════════════════╤═══════════════════╤═══════════════════╗
// ║ │ group │ val1_mean │ val2_mean ║
// ╟────────────┼───────────────────┼───────────────────┼───────────────────╢
// ║ 0 │ A │ 1.0 │ 2.5 ║
// ╚════════════╧═══════════════════╧═══════════════════╧═══════════════════╝
I don't know is something like that would be what you need.
Anyway, why do you say that groupby aggregations behave inconsistently? Can you give another example with a inconsistent result?
Ah sorry, I copy-pasted my code poorly into the expected example, I've updated it. What I mean is that.
testDf.loc({ columns: ['val1', 'val2'] }).mean({ axis: 0 }).print();
Will ignore NaN
but
testDf.groupby(['group']).mean().print();
will not. The challenge is that if I need to do something like:
agg({
installs: ['sum'],
value: ['mean', 'sum']
});
Different columns can have NaN
on different rows, and it seems like dropna()
will drop rows or columns if there's a NaN
in any position, while to be able to use agg
it would need to cover the values only on the column being aggregated.
The pandas option seem different from what I'm suggesting. That will drop columns and rows, but I'm just specifically talking about the aggregation behaviour of NaN
values. If installs
is NaN
, but value
is not. I would still want to aggregate on the value
of that row.
Does this make sense?
In pandas:
import pandas as pd
d = {'group': ['A', 'A'], 'col1': [1, None], 'col2': [2, 3]}
df = pd.DataFrame(data=d)
df.groupby(['group']).mean()
df.groupby(['group']).agg({
'col1': ['mean'],
'col2': ['mean']
})
Both mean
and agg
here skips None
values by default.
Repl: https://replit.com/@kitsunde/pandas-none-handling-in-groupby
@kitsunde Did you figure out a way around this? I'm facing the same issue.
Hello, maybe I'm missing something but groupby aggregations behave inconsistently with DataFrame aggregations, which is a challenge since setting
NaN
values to a default like 0 behaves differently depending on the aggregations.I will explore using
.apply
to work around this as mentioned in https://github.com/javascriptdata/danfojs/issues/187#issuecomment-827531989To Reproduce
results in
val1_mean
beingNaN
. I.e.Expected behavior
To aggregate like:
i.e.