Update miss_scan_count to miss_var_scan and miss_case_scan

njtierney commented 6 years ago

the current output of miss_scan_search is:

library(naniar)

dat_ms <- tibble::tribble(~x,  ~y,    ~z,
                          1,   "A",   -100,
                          3,   "N/A", -99,
                          NA,  NA,    -98,
                          -99, "E",   -101,
                          -98, "F",   -1)

miss_scan_count(dat_ms,-99)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            1
#> 2 y            0
#> 3 z            1

Created on 2018-06-28 by the reprex package (v0.2.0).**

But I think that it should be different, something like this:

variable	total	-99
x	1	1
y	0	0
z	1	1

Would be more useful. This becomes more apparent if you had something like this:

miss_var_scan(dat_ms,c(-99, -98))

variable	total	-99	-98
x	2	1	1
y	0	0	0
z	2	1	1

And this:

miss_var_scan(dat_ms,c(-99, -98, "N/A"))

variable	total	-99	-98	N/A
x	2	1	1	0
y	1	0	0	1
z	2	1	1	0

And possible a summary on the bottom:

variable	total	-99	-98	N/A
x	2	1	1	0
y	1	0	0	1
z	2	1	1	0
total	5	2	2	1

njtierney commented 1 year ago

And implementing this for cases, rowise, would resolve #248

runkelcorey commented 1 year ago

A big architectural question you may have already thought about: what if you had a utility function as_naniar(df, as_missing = c(NA)) that returned a new class that included the tibble but also defined which values to count as missing? Then, users wouldn't need to specify missing values each time they called a function (as is currently implemented) and could still specify different missing values for each dataset (which couldn't happen if you used an options approach at the start of each session). Would still inherit tbl class for interoperability with other tidy packages.

njtierney / naniar

Update miss_scan_count to miss_var_scan and miss_case_scan #184