Odd behavior of is.na() on single elements extracted from a vector of distributions

mitchelloharawild / distributional

Vectorised distributions for R

https://pkg.mitchelloharawild.com/distributional

GNU General Public License v3.0

94 stars 15 forks source link

Odd behavior of is.na() on single elements extracted from a vector of distributions #29

Closed mjskay closed 3 years ago

mjskay commented 4 years ago

Applying is.na over a vector of distributions gives the expected result:

x = c(dist_normal(0,1), NA)
is.na(x)

[1] FALSE  TRUE

But if you extract individual elements, is.na() seems to be applied over the arguments:

is.na(x[[1]])

   mu sigma 
FALSE FALSE

is.na(x[[2]])

logical(0)

I'm running into this in ggdist because (at a high level) I need to do an equivalent of lapply over a vector of distributions, but need different behavior for NA values compared to non-NA values. It appears that the problem is extracting an NA from such a vector yields NULL instead of a missing value:

x[[2]]

NULL

It's easy enough for me to work around, but seems like a possible inconsistency in the API.

mitchelloharawild commented 4 years ago

When #25 is added, most (all?) distributions will be implemented using vctrs::new_rcrd() which should fix this issue.

An NA handler could also be added to dist_default to fix this.

mjskay commented 4 years ago

Sweet, thanks!

mitchelloharawild commented 3 years ago

While #25 should fix this issue, I don't think it'll ever be recommended to extract single elements from the distribution vector. Doing this will cause issues with combining distributions of different classes, and potentially return lower-level programming oriented data structures. The return outputs of sub-distribution classes (such as dist_normal()) is subject to change.

I see that {ggdist} uses indexing which drops the distribution class, likely intentionally to obtain multiple density/cdf/quantile from a single function call. For now this is okay (but will cause issues with NA handling as above), but when this functionality is added to the distribution class it would be better/safer to maintain the distribution meta-class

mjskay commented 3 years ago

Ah yes, I was meaning to ask about the distinction between the distribution class and the other classes. It currently makes programming against the data type a bit weird since normally in R data types there is no distinction between a vector type and 1 element of that vector type (eg to support distribution objects in a generic function, unless I've missed something, seems to require an implementation for distribution and for dist_default?). I wonder if [[ indexing should always return a distribution object to maintain similar semantics to base types.

mitchelloharawild commented 3 years ago

I also wonder if [[ should return a <distribution> class, and for underlying structure to be obtained with vec_data() instead. It's definitely an option to consider.

The <distribution> class is more of a distribution manager, allowing distributions of different classes to be combined/vectorised. The role of the class will be more important when #25 is added, as it will group distributions which have the same attributes to make use of the native vectorised methods. Methods for generics are required because it informs how the function should be dispatched to its members and re-assemble the results appropriately.

The <dist_*> classes implement the methods for a specific distribution.

The <dist_default> methods provide defaults for <dist_*> classes, of which many can be approximated from other methods.

mitchelloharawild commented 3 years ago

Indexing via <dist>[[1]] will no longer enter into the distribution object, and will instead give <dist>[1] to preserve the 'distribution' class structure. To enter into the underlying distribution objects as before, you will need to use vec_data(<dist>)[[1]].

I expect this will be a breaking change for ggdist, let me know if you need me to stagger this change into a later release.

mjskay commented 3 years ago

Ah cool, this is a great change!

Yeah I just tested against the github version and there are breakages. When are you planning to release?

Sidebar: could you bump your package version in DESCRIPTION on github so I can do version checks against the dev version? Thanks!

mitchelloharawild commented 3 years ago

Ah cool, this is a great change!

:sunglasses:

Yeah I just tested against the github version and there are breakages. When are you planning to release?

The urgency on release is to get your PRs on CRAN - so no rush on my end.

Sidebar: could you bump your package version in DESCRIPTION on github so I can do version checks against the dev version? Thanks!

Oops - hadn't realised this wasn't done yet. Fixed in ea3df2f4d7c7e092dcba4e349dc3266fa08bf21c

mjskay commented 3 years ago

The urgency on release is to get your PRs on CRAN - so no rush on my end.

Ah k cool. I just made a new release so I'll probably wait a couple weeks to not make CRAN mad at me for submitting too often. I'll ping back when I do.

Oops - hadn't realised this wasn't done yet. Fixed in ea3df2f

Sweet thanks!