laresbernardo / lares

Analytics & Machine Learning R Sidekick
https://laresbernardo.github.io/lares/
233 stars 49 forks source link

wrong categorization in missingness #21

Closed thfuchs closed 4 years ago

thfuchs commented 4 years ago

Hey laresbernardo, thanks for the very nice package! I'm currently investigating several packages for EDA, yours has a very broad range!

The missingness function has a slight bug - it's displaying not all variables containing NA in the "with" section and does not calculate percentage missing for them.

https://github.com/laresbernardo/lares/blob/6b87a600ec4ed1074e0974b2e651879d2876b113/R/missings.R#L30-L61

I've found this is due to tidyr::gather not working as expected and can be easily solved by using the new function tidyr::pivot_longer

Here I have the plot function (I also used glue for paste in the note)

if (plot) {
  obs <- nrow(df) * ncol(df)
  miss <- sum(m$missing)
  missp <- 100 * miss/obs
  note <- glue::glue(
    "Total values: {lares::formatNum(obs, 0)} | ",
    "Total missings: {lares::formatNum(miss, 0)} ",
    "({lares::formatNum(missp, 1)}%)"
  )
  p <- df %>% mutate_all(is.na) %>% 
    tidyr::pivot_longer(cols = tidyselect::everything()) %>%
    {if (!full) filter(., name %in% m$variable) else .} %>% 
    mutate(type = ifelse(name %in% m$variable, "with", "without")) %>% 
    group_by(name) %>% 
    mutate(row_num = row_number()) %>% 
    mutate(perc = round(100 * sum(value)/nrow(df), 2)) %>% 
    mutate(label = ifelse(type == "with", paste0(name, " | ", perc, "%"), name)) %>% 
    arrange(value) %>% 
    ggplot(aes(x = reorder(label, perc), y = row_num, fill = value)) + 
    geom_raster() + 
    coord_flip() + 
    {if (full) facet_grid(type ~ ., space = "free", scales = "free")} + 
    {if (summary) lares::scale_y_comma(note, expand = c(0, 0)) else lares::scale_y_comma(NULL, expand = c(0, 0))} + 
    scale_fill_grey(name = NULL, labels = c("Present", "Missing"), expand = c(0, 0)) + 
    labs(title = "Missing values", x = "", subtitle = if (!is.na(subtitle)) subtitle) + 
    lares::theme_lares2(legend = "top") + 
    theme(axis.text.y = element_text(size = 8))

  return(p)
}
laresbernardo commented 4 years ago

Hi @TFcfgo, great catch! Fixed! Thanks for your feedback and proposed code. I didn't use glue but sprintf, which is pretty similar and don't have dependencies. This was a very old (almost forgotten) code and changing it to pivot_longer is a great call. Please, do share your insights and EDA libraries and discoveries. Cheers ;)