njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
649 stars 54 forks source link

recode_shadow() special missings are not accounted for by summary functions #340

Open mkcaulfield opened 5 months ago

mkcaulfield commented 5 months ago

Hi! Thanks so much for the package; it's such an important tool that the R ecosystem really needed!

I've been using recode_shadow() to handle some special missings, and while that works to change the shadow columns / update the factor levels, when I try to use functions to summarize characteristics of missingness in the dataframe, eg, add_any_miss() or miss_var_table(), it doesn't recognize these recoded special missings AS missing; it marks the first row of the dataframe as complete despite the -99 value for wind being a special missing. It might be nice if there were an option to choose whether NA aggregations distinguish between "true" / plain NA and special NAs, but if not, I think this omission could easily mislead someone about the completeness of their data.


library(naniar)
df <- tibble::tribble(
  ~wind, ~temp,
  -99,    45,
  68,    NA,
  72,    25
)

df
#> # A tibble: 3 × 2
#>    wind  temp
#>   <dbl> <dbl>
#> 1   -99    45
#> 2    68    NA
#> 3    72    25
df_recode <- df |> bind_shadow() |>
  recode_shadow(wind = .where(wind == -99 ~ "broken_machine"))

df_recode |> add_any_miss()
#> # A tibble: 3 × 5
#>    wind  temp wind_NA           temp_NA any_miss_all
#>   <dbl> <dbl> <fct>             <fct>   <chr>       
#> 1   -99    45 NA_broken_machine !NA     complete    
#> 2    68    NA !NA               NA      missing     
#> 3    72    25 !NA               !NA     complete
df_recode |> miss_var_table()
#> # A tibble: 2 × 3
#>   n_miss_in_var n_vars pct_vars
#>           <int>  <int>    <dbl>
#> 1             0      3       75
#> 2             1      1       25

Created on 2024-02-02 with reprex v2.1.0```

njtierney commented 5 months ago

Hello!

Thank you very much for the kind words :)

I'm glad to hear that you are using the special missings feature, and this is a great point that there should be some way to support/account for them in the missingness summaries.

When I'm next able to get some time to do a sprint on naniar and visdat I will revisit this and touch base, hopefully that will be sooner (0-3 months) rather than later!