Open pp611 opened 7 years ago
Thanks for the comment! I think you're right that it's off by 2, but a clearer way to accomplish that would be to use aggfunc='sum'
rather than the default aggfunc='mean'
.
Just changing aggfunc
seems not right either, it should group by dates first, sum up the two rows of each day's male and female births then take the aggfunc='mean'
. I could not figure out how to make a DataFrame object out of births.groupby(['year', 'month', 'day']).sum()
with the same year/month/day index. So making the whole births_by_date
times 2 seems the easiest way. Mathematically it is explainable, conceptually it could be better.
In Chapter 3's "Pivot Tables section" and Chapter 4's "Text and Annotation" section, when computing the births by date using:
Should each value be multiplied by 2 since male and female births are counted on separate rows? So should it be:
instead?