Open pshMsGsAZL2n07ohWBCAGX9cC opened 4 years ago
I'm not sure how the plotting stuff is related, but based on the title this sounds like https://github.com/pandas-dev/pandas/issues/3729. Let me know if not.
No, this issue is different from #3729. This issue isn’t dropping entire NaN rows like #3729, and my issue already uses the dummy variables suggested in that post as a workaround. My issue is dropping a NaN total in the margin row (but not column), while the dummy variables appear fine. I would suggest that you re-open it. Thanks again, and happy holidays.
Forgot to add in first clarification. This issue occurs without doing a heatmap plot. That is, the row margin total shows NaN, despite zeros for the row elements. Column margin total doesn’t have this issue.
I'm not sure how the plotting stuff is related, but based on the title this sounds like #3729. Let me know if not.
Tom, I added a clarification about a week ago, saying that I didn't think it was a duplicate. Can you please take a look? Apologies in advance, as I'm relatively new to github and don't understand exactly how issues are "re-opened". Thanks.
@pshMsGsAZL2n07ohWBCAGX9cC can you make a minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports Is seaborn / matplotlib necessary, or does just
In [2]: table_byindustry
Out[2]:
PRODUCT A B C Total
INDUSTRY
Auto 2 0 0 2.0
Manf 0 0 0 NaN
Retail 0 0 0 NaN
Telco 0 1 0 1.0
Total 2 1 0 3.0
display the issue? Is it that the total for Manf and Retail are NaN, rather than 0?
@TomAugspurger Seaborn and matplotlib aren't necessary, although that's how I first saw the problem. The dataframe (table_byindustry) shows the issue, which you correctly summarized. Total for the Manf and Retail rows are NaN, rather than 0. Thanks for re-opening it and sharing how I can better report future bugs with minimal information.
Seems like this behavior is due to margin calculation using the groupby()
method and na
are default dropped by groupby()
.
My workaround is to just use isna()
on the pivot_table output.
@aoot Thanks for the workaround. At the time, I wanted a table that would properly show rows with zero values (e.g., Manf and Retail) rather than NaNs without the need to handle edge cases (e.g., use isna() to identify such rows then replace them with zeros). Looking back, I probably should have address the issue in data prep/cleansing before creating the table. However, my expectation was that pivot_table() and groupby() would be more Excel-like in their behavior with respect to this issue ;) Thanks again for sharing.
Code Sample, a copy-pastable example if possible
Problem description
Pivot Table Margins ("Total") isn't showing 0 values for rows, yet will do so for columns (e.g., Product C). As a result, the heatmap is empty for the "Total" column related to the rows ('Manf' and 'Retail') in my example. It is treating those rows as NaN for Margin, and setting fill_value at 0 doesn't fix it. See attached image. Thank you for your efforts to fix this issue.
Expected Output
Based on the color map, there should be a color for every cell in the pivot table, including the Margins. In addition, if the value is 0, it should be shown when the margins=True. It seems as if the treatment is inconsistent (i.e., view the total for Product C vs. the totals for 'Manf' and 'Retail' in Industry).
Output of
pd.show_versions()