kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
890 stars 103 forks source link

Not taking into account None values in group_by #94

Open pachoning opened 4 years ago

pachoning commented 4 years ago

Hi!

I have a dataframe with three variable: id, category and age.

df = pd.DataFrame({'id' : [1,2,3,4], 'category' : ['a', 'a', 'b', None], 'age': [12,54,67,89]})

I am performing a group_by using category variable, which has a None value:

df >> group_by(X.category) >> summarize(total = n(X.id))

The result is the following one:

category total
a 2
b 1

Shouldn't it be the following result?

category total
a 2
b 1
None 1

Even though I transform the None to np.nan, I get the same result.

sundarcf commented 2 years ago

Is there a way to have dropna=True arg in group_by() ? (Pandas Ref: https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.1.0.html#allow-na-in-groupby-key) cc: @kieferk