Open nikhilpatwardhan opened 4 years ago
This looks to be the source of the bug (it's dropping rows with any missing data after calculating the pivot table but before getting the margins): https://github.com/pandas-dev/pandas/blob/d106b81ce532bc71ec6cced944ddb751a4b0e5a3/pandas/core/reshape/pivot.py#L159
which was itself added to fix a bug in crosstab: https://github.com/pandas-dev/pandas/pull/12614. So somehow this would have to be fixed without breaking crosstab.
Still exists in pandas 1.4.3
>>> df = pd.DataFrame({"a": [1, 2, None, 4], "b": ["a", "a", "a", "b"], "c": [1] * 4})
>>> df
a b c
0 1.00 a 1
1 2.00 a 1
2 NaN a 1
3 4.00 b 1
>>> df.pivot_table(index="b", values=["a", "c"], aggfunc="sum", margins=True)
a c
b
a 3.00 3
b 4.00 1
All 7.00 3
Would instead expect
>>> df.pivot_table(index="b", values=["a", "c"], aggfunc="sum", margins=True)
a c
b
a 3.00 3
b 4.00 1
All 7.00 4
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[x] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Output:
Problem description
The summary line 'All' shows min for A as 7.0 and max for B as 5.0 although min of A is 6.0 and max of B is 8.0
Expected Output
Output of
pd.show_versions()