harrelfe / Hmisc

Harrell Miscellaneous
Other
205 stars 81 forks source link

weighted quantile bug fix #134

Open iii-org-tw opened 4 years ago

iii-org-tw commented 4 years ago

Original code has two problems:

  1. When normwt = FALSE and weight sum < 1. image

This line goes wrong. The indexing becomes descending.

  1. It sees the lowest index as 1. image

For example:

> data <- 1:3
> weights<- c(0.1, 0.1, 0.8)
> probs=0.09

 wtd.quantile(x=data, weights=weights, probs=probs, normwt=TRUE)
9% 
3 

It is because the cumsum is 0.3, 0.6, 3.0 and the approx function try to locate lower index 1 and upper index 2. It turns both q take the 3 value and the interpolation crush.

I find one formula to define weighted quantile. https://stats.stackexchange.com/questions/13169/defining-quantiles-over-a-weighted-sample It is well defined in boundary and the result equals to numpy quantile when weights are equal. I implement the code accordingly and this one doesn't have the above mentioned problem.

Any review is welcome.

iii-org-tw commented 4 years ago

Hi @harrelfe

Do you have time to review this PR?

harrelfe commented 3 years ago

Would you mind doing some tests on how this changes behavior in the non-breaking case? Also I would need changes in the help file to go along with these changes. Thanks for working on this.