Closed nalimilan closed 6 years ago
It makes sense to me that those with zero or negative weight should be removed from the dataset before doing anything. Let me know if you see a problem with that strategy, otherwise I'll make the change for the next release.
I don't see any problem with removing them beforehand. But I'm surprised the algorithm isn't equivalent to removing them explicitly, since it's equivalent to repeating each value a number of times corresponding to its weight (for non-zero integer weights at least).
BTW, this sounds surprising to me:
> wtd.quantile(c(7, 1, 2, 4, 10, 15), c(1, 1/3, 1/3, 1/3, 1, 1))
0% 25% 50% 75% 100%
4.00 6.25 8.50 11.25 15.00
> wtd.quantile(c(7, 1, 2, 4, 10, 15), c(1, 1/3, 1/3, 1/3, 1, 1), normwt=T)
0% 25% 50% 75% 100%
2.00 7.00 8.50 13.75 15.00
Shouldn't the 0th quantile always be equal to the minimum?
Good question. Sorry I don't have time to diagnosis this. Hope you can.
FWIW we're working on an implementation for Julia at https://github.com/JuliaStats/StatsBase.jl/pull/316/, so you could take inspiration from there (though I'm still not sure what's the best approach but @matthieugomez knows better).
For the upcoming release on CRAN I am excluding zero weighted observations up front in wtd.quantile
.
Thanks! I confirm that the examples above work fine now.
I'm not sure this is actually a bug, but I would naively have expected that frequency weights passed to
wtd.quantile
were equivalent to repeating a value the corresponding number of times. As shown below, this assumption works for non-zero weights, but not for zero weights:This also happens with other values of the
type
argument.Is there a particular reason for this? Would you have a reference about the implemented algorithm?