ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 128 forks source link

Errors store large amounts of data #1276

Closed billdenney closed 4 years ago

billdenney commented 4 years ago

Prework

Description

I have an issue that may be identical to #1253. Unfortunately, I can't share the example as it is proprietary (and much like you, COVID-related work takes the time I would usually take to try to create a reprex).

Describe the bug clearly and concisely.

It comes up for me, when I'm building a relatively large plan for me (the .drake directory is ~2.6GB). At the point of using rmarkdown to build a report where much of the cache will be loaded, there was a bug in the report causing an error.

After that error, I got "Repacking large object".

When I corrected the error, it ran without issue, and there was no "repacking large object".

My guess is that I'm having the same type of issue as in #1253. I'm running drake 7.12.2.

Reproducible example

Unfortunately, I can't readily make a reprex right now. My hope is that the description above and the link to #1253 will help with some troubleshooting.

Expected result

A quick error message.

As a thought from reading #1253, it seems like the stack trace from errors along with their environments are being stored, and that could be my issue. For errors, could the default behavior be not to store the environments associated with the error?

Session info

> sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pander_0.6.3              knitr_1.28                TopicLongTable_0.0.0.9010 arsenal_3.4.0             gdtools_0.2.2             rmarkdown_2.2            
 [7] cowplot_1.0.0             forcats_0.5.0             stringr_1.4.0             dplyr_1.0.0               purrr_0.3.4               readr_1.3.1              
[13] tidyr_1.1.0               tibble_3.0.1              tidyverse_1.3.0           drake_7.12.2              xpose_0.4.10              mrgsolve_0.10.1          
[19] Hmisc_4.4-0               ggplot2_3.3.1             Formula_1.2-3             survival_3.1-12           lattice_0.20-41           assertr_2.7              
[25] rio_0.5.16                truncnorm_1.0-8           caTools_1.18.0            bsd.report_0.0.0.9067    

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1          ellipsis_0.3.1            htmlTable_1.13.3          RcppArmadillo_0.9.900.1.0 base64enc_0.1-3           fs_1.4.1                 
 [7] rstudioapi_0.11           farver_2.0.3              fansi_0.4.1               lubridate_1.7.9           xml2_1.3.2                splines_4.0.1            
[13] polyclip_1.10-0           jsonlite_1.6.1            broom_0.5.6               cluster_2.1.0             dbplyr_1.4.4              png_0.1-7                
[19] ggforce_0.3.1             compiler_4.0.1            httr_1.4.1                backports_1.1.7           assertthat_0.2.1          Matrix_1.2-18            
[25] cli_2.0.2                 tweenr_1.0.1              acepack_1.4.1             htmltools_0.4.0           prettyunits_1.1.1         tools_4.0.1              
[31] igraph_1.2.5              gtable_0.3.0              glue_1.4.1                Rcpp_1.0.4.6              cellranger_1.1.0          vctrs_0.3.0              
[37] svglite_1.2.3             nlme_3.1-148              xfun_0.14                 openxlsx_4.1.5            rvest_0.3.5               PKNCA_0.9.4              
[43] lifecycle_0.2.0           MASS_7.3-51.6             zoo_1.8-8                 scales_1.1.1              hms_0.5.3                 parallel_4.0.1           
[49] RColorBrewer_1.1-2        yaml_2.2.1                curl_4.3                  gridExtra_2.3             rpart_4.1-15              latticeExtra_0.6-29      
[55] stringi_1.4.6             highr_0.8                 checkmate_2.0.0           filelock_1.0.2            zip_2.0.4                 storr_1.2.1              
[61] rlang_0.4.6               pkgconfig_2.0.3           systemfonts_0.2.3         bitops_1.0-6              evaluate_0.14             labeling_0.3             
[67] htmlwidgets_1.5.1         tidyselect_1.1.0          magrittr_1.5              R6_2.4.1                  generics_0.0.2            base64url_1.4            
[73] txtq_0.2.0                DBI_1.1.0                 mgcv_1.8-31               pillar_1.4.4              haven_2.3.1               foreign_0.8-80           
[79] withr_2.2.0               nnet_7.3-14               modelr_0.1.8              crayon_1.3.4              utf8_1.1.4                jpeg_0.1-8.1             
[85] progress_1.2.2            grid_4.0.1                readxl_1.3.1              data.table_1.12.8         qpdf_1.1                  blob_1.2.1               
[91] reprex_0.3.0              digest_0.6.25             munsell_0.5.0             askpass_1.1   
vkehayas commented 4 years ago

Yeah, I sometimes get something like this as well when a large ggplot fails and the whole data.frame/environment is saved in the error report. I have not tested the latest version in which this is solved or mitigated, still running 7.12.0.

billdenney commented 4 years ago

@vkehayas , tl;dr: Updating may fix your issue.

In #1253, it was discussed that this may be fixed in 7.12.2, but my instance of this still occurs here. For my case, it may not be a bug directly, but it may be a bottleneck for trying again in these scenarios.

wlandau commented 4 years ago

This definitely sounds like #1253, which I thought I fixed. Traceback objects should no longer contain strange tagalong environments. However, there could still be another stowaway in the metadata list. How big is the list you get from diagnose(target_that_failed)? If it's bigger than a couple kilobytes, what is the inner-most large object you can find with pryr::object_size().

billdenney commented 4 years ago

I watched the memory use during the process, and here is what I noticed:

x fail report

When the words "x fail report" showed up on the screen, memory usage went from ~2GB to ~5GB over the course of a few seconds.

I then saw the following message:

Error: target report failed. diagnose(report)error$message: Problem with mutate() input figures. x Input figures must be a vector, not a gg_list object. i Input figures is as_gg_list(...).

(And some other stuff indicating the stack trace as normally shows up.)

When I run `pryr::object_size(diagnose(report))`, it is 1.28GB.  As that is bigger than a couple kb, I investigated further.  I'm showing the nesting of the object that I found:

diagnose(report): 1.28GB $error: 1.28 GB $dots: 1.28 GB $figures: 1.28 GB $captions 1.28 GB (yes, both of these are showing up as 1.28 GB after the parent showed up as the same size)


Oddly, the items within `diagnose(report)$error$dots$figures` (and `$captions`) are 56 B and 3.54 kB, not the 1.28 GB that the containing objects were.  Both of these items are quosures.  I can share what one looks like:

```r
> diagnose(report)$error$dots$figures
<quosure>
expr: ^as_gg_list(pmap(.l = list(data = data, parameter = Parameter, assay = assay, allo_cl = allo_CL, allo_vc = allo_VC), .f = plotter, hline_values = adult_lines))
env:  0000023B83FA2408

It looks like the problem is the environment attached to the quosure:

> pryr::object_size(rlang::get_env(diagnose(report)$error$dots$figures))
1.28 GB
wlandau commented 4 years ago

Try 2295fc97fe35c0ceab8f56d1775c508aceb147d0. I gave up on language objects and decided to just have drake store tracebacks as character vectors. That ought to do it.

wlandau commented 4 years ago

Wait, I just realized the traceback isn't the problem like it was for #1253. We'll have to cull the error object in other ways too.

wlandau commented 4 years ago

Try adcefba554cbde08fccf76ae63879bd753cdf74d. That dots object shouldn't show up in the error object as of https://github.com/ropensci/drake/commit/af2bbbf04003005d8e2da2908a6c0e4be7be34da.

billdenney commented 4 years ago

adcefba fixes it for me!

After x fail report, there was no memory usage increase like there was last time. pryr::object_size(diagnose(report)) is 25.8 kB.

wlandau commented 4 years ago

Great! Closing.