Open eepstein opened 11 years ago
Part of the problem is what one should do in this situation. How do you sum up rows that have NA's in them? Is it still valid to sum up those rows that don't have values? We cannot assume that those values should be counted as zeroes. Do you have a use case that you can suggest?
An example would be average systolic blood pressure reading across multiple patient visits. It might not be measured every time, but the patient presumably still had one that was simply unobserved.
In R, it's handled this way:
mean( c ( 0, 5, NULL, 10, NULL, 15))
-> 7.5sum( c ( 0, 5, NULL, 10, NULL, 15))
-> 30R also differentiates NULL
from NA
(analogous to null
and undefined
):
mean( c ( 0, 5, NA, 10, NA, 15))
-> NAsum( c ( 0, 5, NA, 10, NA, 15))
-> NASurveys can have cases where you might want missing values to be treated as zero in a mean, such as average wait time in a survey with skip patterns, e.g.
But then the analyst would be expected to explicitly recode missings as zero, and would not expect a second kind of parameter for handling NA in the operand.
Seems this is a problem with how the sum(), and in turn mean() and possibly other methods are implemented. They don't seem to detect non-numerics Except as the very first element of an array.
Use case: grouping across rows where some rows have null (or NaN) values for certain columns. Average should be across the non-null, numeric values.
It would seem from the docs that this is a feature. The code seems to indicate otherwise.