Closed alyssadai closed 3 months ago
Thanks for your comments @surchs!
I've made some additional changes including reverting back to using pivot
instead of pivot_table
due to some (documented) issues w/ the latter dropping NaN
s silently that I didn't notice before. Tests and comments have also been updated to prevent this from coming back, and to capture some of the current peculiarities in the long-to-wide tabular data transform.
Let me know if the changes/comments make sense 🙂
Changes proposed in this pull request:
pivot
ing the long-format input to wide, pandas will simply create empty cells for any combinations that do not exist in the original data, which we think is fine for nowpd.pivot
->pd.pivot_table
) that has finer control over the aggregation method when there are duplicates, and values used to fill missing values in the resulting pivoted table~pd.pivot
due to known problems withpd.pivot_table
silently droppingNaN
spivot_table
could theoretically handle this by keeping only the first occurrence in the resulting pivoted table, but this is not intuitive and may mean relevant data is lost - rather, we probably want to flag and tell users to fix duplicate observations)Checklist
This section is for the PR reviewer
[ENH]
,[FIX]
,[REF]
,[TST]
,[CI]
,[MNT]
,[INF]
,[MODEL]
,[DOC]
) (see our Contributing Guidelines for more info)Closes #XXXX
For new features:
For bug fixes: