njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
650 stars 54 forks source link

`vis_miss()` yeilding an empty plot. #296

Closed benyamindsmith closed 2 years ago

benyamindsmith commented 2 years ago

Hi there! Big fan of this package and I use it on regularly when doing analysis. Recently I have been working with a very large dataset and I find that the vis_miss() is not working as intended. I did set warn_large_data=FALSE but for some reason I'm just getting a blank plot.

The data that I am working with is this so making a minimal reproducible example is a little challenging. The code I have is.

dt<- readr::read_csv("./ookla-canada-speed-tiles.csv")

naniar::vis_miss(dt ,warn_large_data = FALSE)

The output is: image

Which is strange as there is missing data but it is not visualized at all.

If you know of any reason why this is the case I would greatly appreciate it!

Thanks!

njtierney commented 2 years ago

Hi there!

Thanks for posting this, glad you are enjoying using naniar and visdat!

This is a problem that has plagued me for years - see https://github.com/ropensci/visdat/issues/32

As far as I can tell, this is due to the way the cells are drawn in ggplot, so for large data, there are too many cells. However, "large data" depends on your machine specs - how much RAM, memory, etc you have.

My main recommendation here is to try downsampling your data:

library(visdat)
library(dplyr)
data %>%
  sample_n(size = 1000) %>%
  vis_dat()

# or
data %>%
  slice(1:1000) %>%
  vis_dat()

Or use other functions to explore your missing data, like gg_miss_upset(data), or try modelling missingness using decision trees, as shown in this vignette

Sorry I can't be more help!