njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
651 stars 53 forks source link

Update miss_scan_count to miss_var_scan and miss_case_scan #184

Open njtierney opened 6 years ago

njtierney commented 6 years ago

the current output of miss_scan_search is:

library(naniar)

dat_ms <- tibble::tribble(~x,  ~y,    ~z,
                          1,   "A",   -100,
                          3,   "N/A", -99,
                          NA,  NA,    -98,
                          -99, "E",   -101,
                          -98, "F",   -1)

miss_scan_count(dat_ms,-99)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            1
#> 2 y            0
#> 3 z            1

Created on 2018-06-28 by the reprex package (v0.2.0).**

But I think that it should be different, something like this:

variable total -99
x 1 1
y 0 0
z 1 1

Would be more useful. This becomes more apparent if you had something like this:

miss_var_scan(dat_ms,c(-99, -98))
variable total -99 -98
x 2 1 1
y 0 0 0
z 2 1 1

And this:

miss_var_scan(dat_ms,c(-99, -98, "N/A"))
variable total -99 -98 N/A
x 2 1 1 0
y 1 0 0 1
z 2 1 1 0

And possible a summary on the bottom:

variable total -99 -98 N/A
x 2 1 1 0
y 1 0 0 1
z 2 1 1 0
total 5 2 2 1
njtierney commented 1 year ago

And implementing this for cases, rowise, would resolve #248

runkelcorey commented 1 year ago

A big architectural question you may have already thought about: what if you had a utility function as_naniar(df, as_missing = c(NA)) that returned a new class that included the tibble but also defined which values to count as missing? Then, users wouldn't need to specify missing values each time they called a function (as is currently implemented) and could still specify different missing values for each dataset (which couldn't happen if you used an options approach at the start of each session). Would still inherit tbl class for interoperability with other tidy packages.