harrelfe / Hmisc

Harrell Miscellaneous
Other
204 stars 81 forks source link

Difference between Hmisc::wtd.var() and stats::cov.wt() #183

Open dopatendo opened 2 months ago

dopatendo commented 2 months ago

I noticed that for the unbiased method wtd.var() and cov.wt() offer different results.

I would like to confirm if this is intended. Here is an example:

xx = c(2,3,5,7,11,13,17,19,23) wt = c(1,1,2,3,4, 1, 2, 1, 1)

Hmisc::wtd.var(x = xx, weights = wt,method = 'unbiased') stats::cov.wt(x = cbind(xx), wt = wt,method = 'unbiased')$cov

tyner commented 1 month ago

If you want them to match in this case, then need to set normwt = TRUE when calling Hmisc::wtd.var -- see the documentation for how this argument changes the interpretation of the weights. In particular:

normwt: specify ‘normwt=TRUE’ to make ‘weights’ sum to ‘length(x)’ after deletion of ‘NA’s. If ‘weights’ are frequency weights, then ‘normwt’ should be ‘FALSE’, and if ‘weights’ are normalization (aka reliability) weights, then ‘normwt’ should be ‘TRUE’. In the case of the former, no check is made that ‘weights’ are valid frequencies.

Basically, it all boils down to whether your weights are realizations of random variables or not. Only you can answer that question.