Open NicolasWoloszko opened 6 years ago
I will need help with a code fix for this. It's best to do a Github pull and then have the system send me a merge request.
I haven't have to chance to look at the code, but the 0% quantile seem to be defined as the value when the cumulative weight equals one. For the 100% percentile it's not clear but it's not the max either.
Example code:
test_wq <- as.data.frame(list(values = c(1,2,2,2,3,3,3,3,3,3,3,4,4,4,5,5,5,5,6,6,8,8,8,8,8,9), wt = c(0.1,.2,.2,.2,.3,.3,.3,.3,.3,.3,.3,.4,.4,.4,.1,.1,.1,.1,.6,.6,.1,.1,.1,.1,.1,.1))) wtd.quantile(test_wq$values, weights = test_wq$wt)
0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|
3.0 | 3.3 | 4.0 | 5.8 | 8.2 |
While without weights:
wtd.quantile(test_wq$values)
0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|
1 | 3 | 4 | 6 | 9 |
Using wtd.quantile on a weighted column (N=90000), we get this incoherence :
`
As you see the actual min is -3993960 whereas the first percentile is -3.494166e+06. This creates problems for instance when used with cut().