Closed verajosemanuel closed 6 years ago
Hi @verajosemanuel !
Thanks for posting this - have you tried downsampling your data?
Perhaps some code such as
library(dplyr)
data %>% sample_frac(0.1) %>% vis_dat()
could work?
I think that I need to be more clear in the error message that visualising data of a large size such as this is largely dependent on the computing environment. For example, my machine can run the below code and produce the graphics, but someone with a less powerful laptop or PC cannot.
library(visdat)
# fake large data,
fake_large <- tibble::as_tibble(matrix(1:1e6, nrow = 1e5))
vis_dat(fake_large)
#> Error in vis_dat(fake_large): Data exceeds recommended size for visualisation, please consider
#> downsampling your data, or set argument 'warn_large_data' to FALSE.
vis_dat(fake_large, warn_large_data = FALSE)
library(nycflights13)
vis_dat(flights)
#> Error in vis_dat(flights): Data exceeds recommended size for visualisation, please consider
#> downsampling your data, or set argument 'warn_large_data' to FALSE.
vis_dat(flights, warn_large_data = FALSE)
This is a difficult problem to debug as it usually depends on the computing system, and is why we implemented this error message, but we could probably be more clear that setting warn_large_data = FALSE does not necessarily mean your visualisation will work.
My current understanding is that this limitation is down to the capabilities of:
Future implementations of visdat
will incorporate plotly
libraries, which might be more capable of handling larger datasets.
Let me know if you have any questions! :)
downsampling works. In my case computation capacity is not a problem. Got plenty of ram (64) and all needed processors. To check if there's a limitation i've checked with package "narnia" (yes, so it's called) to get a glimpse of NA values with:
narnia::gg_miss_var(df)
Worked flawlessly. visdat is a great package and I use it many times.
thanks a lot.
OK that is interesting to note - @karawoo, do you know if this sort of problem could be down to grid or ggplot? I feel like perhaps the most likely answer is the way that I have coded visdat.
@verajosemanuel I'm glad to know that naniar::gg_miss_var(df)
has been useful for you - note that it is now called naniar
(the name was changed a few times but is now settled).
yeah, I know, but somehow the first time I've tried to install it failed. I'll give it a try again. Something came to mind: why two packages with similar features? why not join visdat and naniar?
regards
Interesting!
If you have an installation problem on naniar please file an issue :)
Good question.
visdat is designed to solve a narrow scope of problem - visualising whole dataframes as a preliminary visualisation. By reducing it to this particular issue it makes the package simpler to maintain, as the package only deals with these kinds of visualisations.
naniar is designed to deal with missing data in R, and is much larger in scope than just exploratory visualisations, it provides a framework to explore and analyse missing data.
Let me know if you have any further questions @verajosemanuel :) Just tidying up issues now, but feel free to let me know if you want to reopen it.
This may be a possible route for update - I had troubles with a large dataframe too (200,000 rows, 10 columns, and didn't want to take a random sample) and was able to solve it by reusing the code from vis_data and vis_miss functions and replacing the geom_raster with geom_tile. I'm not sure why, but in this case it did not have the problem of failing to make the plot. Might be worth looking into.
My df has 30.000 rows, so I set warn_large_data to FALSE
visdat::vis_miss(df, warn_large_data = F)
And all I get is this empty grid: