tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

sum(is.na()) helpers #2

Closed hadley closed 4 years ago

hadley commented 8 years ago

n_absent() and n_present()

njtierney commented 7 years ago

Are you looking to do something really efficient in cpp or would something like the following suffice?

df <- tibble::tibble(x = c(NA, NA, NA, 1.6, 1.8),
                     y = c(NA, 5, 9, 10, NA))

df
#> # A tibble: 5 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1    NA    NA
#> 2    NA     5
#> 3    NA     9
#> 4   1.6    10
#> 5   1.8    NA
# n_absent ------------------------------------------------------------

sum(is.na(df$x))
#> [1] 3
sum(is.na(df$y))
#> [1] 2

n_absent <- function(x) sum(is.na(x))

n_absent(df$x)
#> [1] 3
n_absent(df$y)
#> [1] 2

# n_present -----------------------------------------------------------

sum(!(is.na(df$x)))
#> [1] 2
sum(!(is.na(df$y)))
#> [1] 3

n_present <- function(x) sum(!(is.na(x)))

n_present(df$x)
#> [1] 2
n_present(df$y)
#> [1] 3
hadley commented 7 years ago

It can be a little more efficient if done in C, but basically that.

njtierney commented 6 years ago

Just a friendly note that some of these helpers are in naniar at the moment, not sure if you plan to implement this in vctrs but if you do, perhaps let me know so I can reduce overlap, and/or help out!

library(naniar)

n_miss(airquality)
#> [1] 44
n_miss(airquality$Ozone)
#> [1] 37
n_complete(airquality)
#> [1] 874
n_complete(airquality$Ozone)
#> [1] 116

prop_miss(airquality)
#> [1] 0.04793028
prop_miss(airquality$Ozone)
#> [1] 0.2418301
prop_complete(airquality)
#> [1] 0.9520697
prop_complete(airquality$Ozone)
#> [1] 0.7581699

pct_miss(airquality)
#> [1] 4.793028
pct_miss(airquality$Ozone)
#> [1] 24.18301
pct_complete(airquality)
#> [1] 95.20697
pct_complete(airquality$Ozone)
#> [1] 75.81699
njtierney commented 6 years ago

Wanted to add a note here from https://github.com/r-lib/rlang/issues/558

A verb/function that does always return a data.frame / matrix:

And quoting @hadley :

I think the principle that we could now follow is that are_na(x) has type logical, but shape that matches x. I think that is a succinct description of the behaviour that you desire. (And actually that's a nice description of a vectorised function - the shape of the output matches the shape of the input(s))

hadley commented 4 years ago

I think we'll just stick with is_na() for now; I don't want to commit to 4+ helpers yet.

lionel- commented 4 years ago

@hadley Should we try to maintain a different prefix between is_ predicate and vectorised predicates? This way if you see is_ you know that by design it returns a single non-missing boolean, and it's safe to use in if () for instance.

hadley commented 4 years ago

I suspect that ship has already sailed, but it’s worth thinking about.