Closed mjskay closed 2 years ago
I've improved handling of na.rm
in median()
- and found as you indicate that median.distribution()
doesn't use distribution specific median methods :facepalm:.
I'm not so sure about quantile()
methods, at least on an implementation front. Ignoring na.rm
for non-sample distributions would require adding the argument to each method. Currently quantile(<dist_sample>)
has na.rm=TRUE
by default, however I think this should be changed for R consistency and NA awareness.
Any suggestions on how to handle NAs for quantile()
functions?
(also apologies for all the issues: I've been kicking the tires of {distributional} again as I prepare to increase its support and use in ggdist)
Love them! :+1:
I've improved handling of na.rm in median() - and found as you indicate that median.distribution() doesn't use distribution specific median methods 🤦.
Sweet thanks!
I'm not so sure about quantile() methods, at least on an implementation front. Ignoring na.rm for non-sample distributions would require adding the argument to each method. Currently quantile(
) has na.rm=TRUE by default, however I think this should be changed for R consistency and NA awareness. Any suggestions on how to handle NAs for quantile() functions?
Hmm yeah. For me, support for na.rm
in quantile is less important than median because I already have to special-case it a bit with distributional objects anyway, so there isn't a reduction in complexity in my code by supporting it (unlike with median, where having na.rm
support helps a lot). So if the resolution is not to support it in quantile()
I won't be torn up about it :).
One question I did have is: is it is even possible to create dist_sample()
objects that contain NA
s? I wasn't able to directly:
> dist_sample(list(c(1:10, NA)))
<distribution[1]>
[1] sample[10]
Warning message:
Missing sampled values have been removed from the sample distribution.
In which case, maybe na.rm
is a meaningless parameter even for dist_sample? In which case, you could consider having quantile.distribution()
take na.rm
but not do anything with it just for compatibility's sake.
One question I did have is: is it is even possible to create
dist_sample()
objects that containNA
s? I wasn't able to directly
Not at the moment, but I was considering keeping NA values in sample distributions. Do you think preserving NAs in input and output for sample distributions is important, or remove them upfront with a warning as is currently done. Pinging @njtierney for thoughts on missing values in distributions.
One possible argument in favor of keeping them is for cases where samples are correlated (e.g. in output from MCMC). Then you don't want to drop NA
s because this will change the indices, and the index for each draw actually has some meaning (e.g. what MCMC iteration that draw came from). If you plan to support arithmetic on dist_sample
s by applying the operations elementwise to the underlying arrays, dropping NA
s could then cause incorrect results.
My thoughts on this are that it would be awesome to have distributions that can have missing values as valid values that could be drawn. My thoughts went to the idea of inflating a distribution with NA values, in a similar way to how there might be 0 inflated, there could be NA inflated distributions...but I feel like there might be unintended consequences of this?
Sounds convincing to me. I've changed dist_sample()
to allow missing values.
In that case, I'd say quantile
should not error on other distributions when passed na.rm
, otherwise (unless I'm mistaken) something like quantile(c(dist_normal(), dist_sample(list(c(1:10, NA)))), 0.5, na.rm = TRUE)
would fail?
Yes, I agree that na.rm
should be handled. Sadly I think this requires all distribution functions to have a na.rm
argument that is ignored.
Heh. You know it's a mature R API when you're forced to pass annoying parameters around
Thanks for the suggestion - turns out in most situations it doesn't make sense to pass ...
through to p/d/q/r functions (and in most methods I'm not doing it anyway). So there were only a few methods that needed updating for this.
Thanks for the fix! On the latest version (0.3.0) this is working for me except on dist_mixture()
, which sometimes calls down to cdf()
and then fails. E.g.:
library(distributional)
x = dist_mixture(dist_normal(0), dist_normal(1), weights = c(0.5, 0.5))
quantile(x, 0.5, na.rm = TRUE)
#> Error: 1 components of `...` were not used.
#>
#> We detected these problematic arguments:
#> * `na.rm`
#>
#> Did you misspecify an argument?
#> Error: 1 components of `...` were not used.
#>
#> We detected these problematic arguments:
#> * `na.rm`
#>
#> Did you misspecify an argument?
Created on 2022-01-11 by the reprex package (v2.0.1)
Should cdf()
also now support na.rm
, given the support for NA
s in dist_sample()
?
Hmm, this can occur with all uses of cdf()
, and a few other functions like likelihood()
.
Yes cdf()
should support na.rm
, and it does. However it does by ignoring na.rm
in ...
for most methods, so ellipsis
isn't pleased. Simple solution would be to drop ellipsis
from the generics - if people want to check dots for usage, they can wrap their own function around the generic.
Fixed in 646e9df8923feaad1c399f245f16392c94cc76fe
library(distributional)
x = dist_mixture(dist_normal(0), dist_normal(1), weights = c(0.5, 0.5))
quantile(x, 0.5, na.rm = TRUE)
#> [1] 0.5
cdf(dist_normal(), 1, na.rm=FALSE)
#> [1] 0.8413447
Created on 2022-01-12 by the reprex package (v2.0.0)
thanks for the quick fix!
Unlike
mean.distribution()
, currentlymedian.distribution()
andquantile.distribution()
will give errors if explicitly passedna.rm
:This makes the API a bit more cumbersome to use in ggdist, as I will have to add a special case for median when used with numerics versus when used with distributions. Could the distribution implementation of these functions ignore
na.rm
? In the case ofmedian
at least, the base generic includesna.rm
in its signature so it should probably support it.(also apologies for all the issues: I've been kicking the tires of {distributional} again as I prepare to increase its support and use in ggdist)