harrelfe / Hmisc

Harrell Miscellaneous
Other
208 stars 81 forks source link

unexpected Inf in wtd.var #69

Closed davharris closed 7 years ago

davharris commented 7 years ago

On Stack Overflow, it was pointed out that the following returns Inf as the weighted variance; I'm able to replicate this on my machine (package version: 4.0-0 with R 3.3.0).

library(Hmisc)
x <- c(3.7,3.3,3.5,2.8)

wt <- c(5,  5,  4,  1)/15

xm <- wtd.mean(x, wt)

wtd.var(x, wt)

I haven't looked through all the conditionals in the function, but my best guess is a zero in the denominator with the "unbiased" method.

harrelfe commented 7 years ago

I changed the code to use stats:cov.wt for both methods. Thanks for the report.

tyner commented 7 years ago

I don't think this change is desirable. davharris, in your example, you are using the default settings, which is for an unbiased estimator in the presence of frequency weights. However, your weights add up to 1, which essentially means that you only have one effective observation, so the variance cannot be estimated in an unbiased fashion.

Moreover, I believe stats::cov.wt is geared toward normalization weights, not frequency weights (multiply your weights by any positive finite constant and observe that the return value of stats::cov.wt is unaffected).

However, if we really want to have stats::cov.wt do the heavy lifting in the case of unbiased estimation under normalization weights, then I would propose the following (which, apart from the added warning, should be backwards compatible to version 4.0-2),

wtd.var <- function(x, weights=NULL, normwt=FALSE, na.rm=TRUE,
                    method = c('unbiased', 'ML'))
{
  method <- match.arg(method)
  if(!length(weights)) {
    if(na.rm) x <- x[!is.na(x)]
    return(var(x))
  }

  if(na.rm) {
    s       <- !is.na(x + weights)
    x       <- x[s]
    weights <- weights[s]
  }

  if(normwt)
    weights <- weights * length(x) / sum(weights)

  if(normwt || method == 'ML')
    return(as.numeric(stats::cov.wt(cbind(x), weights, method = method)$cov))

  # the remainder is for the special case of unbiased frequency weights
  sw  <- sum(weights)
  if(sw <= 1)
      warning("only one effective observation; variance estimate undefined")

  xbar <- sum(weights * x) / sw
  sum(weights*((x - xbar)^2)) / (sw - 1)
}
davharris commented 6 years ago

@harrelfe, what do you think about @tyner's point?