noamross / zero-dependency-problems

Real-world code problems in R, without dependencies
79 stars 25 forks source link

means-with-na #13

Open noamross opened 9 years ago

noamross commented 9 years ago

https://github.com/noamross/zero-dependency-problems/blob/master/R/means-with-na.md

Simple data manipulation question:

I have two vectors, say:

c(1,3,NA,7,9)
c(3,NA,7,9,11)

I'd like to return a vector of same length that is the mean of each respective vector position, and also ignore the NA's (so take the average of one number, not two). I don't want NA's in my answer unless the same position in each vector has an NA.

So, the answer I want is:

c(2,3,7,8,10)

Anyone have an elegant solution to this problem?

zmjones commented 7 years ago

very related to #8.

you could start with the most conceptually simple solution, which is to loop over an index and compute the mean with na.rm = TRUE, or just have them write their own mean function which has a condition which excludes missing data.

then you can have them bind the vectors and use apply, which also would allow them to learn about the margin argument.

again you could have them learn about anonymous functions here, e.g. function(x) ifelse(!all(is.na(x)), mean(x, na.rm = TRUE), NA).

also since mean(NA, na.rm = TRUE) == NaN you could discuss the difference between the two.

dirkschumacher commented 5 years ago

Something like this?

colmean <- function(...) {
  vecs <- list(...)
  mat <- Reduce(rbind, vecs)
  apply(mat, 2, function(col) {
    na_col <- is.na(col)
    if (all(na_col)) {
      NA_real_
    } else {
      sum(col, na.rm = TRUE) / sum(!na_col)
    }
  })
}

colmean(
  c(1,3,NA,7,9, NA),
  c(3,NA,7,9,11, NA)
)
#> [1]  2  3  7  8 10 NA

Created on 2018-12-02 by the reprex package (v0.2.1)