ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 129 forks source link

Error in if (is.na(x)) { : argument is of length zero #1101

Closed adamkski closed 4 years ago

adamkski commented 4 years ago

Prework

Description

When I run vis_drake_graph() ormake(), I get the error: Error in if (is.na(x)) { : argument is of length zero. Please note downgrading to drake 7.8.0 resolves the issue. This only happens with the latest dev version.

Reproducible example

This is not straightforward because the code is sensitive to my organization. I have tried to replicate the issue without success - although I did accidentally replicate another issue #1086!

I will try again soon.

Desired result

Typical behaviour for vis_drake_graph() and make()

Session info

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyselect_0.2.5    here_0.1            forcats_0.4.0       stringr_1.4.0       dplyr_0.8.3        
 [6] purrr_0.3.2         readr_1.3.1         tidyr_1.0.0         tibble_2.1.3        tidyverse_1.2.1    
[11] pROC_1.15.3         caret_6.0-84        ggplot2_3.2.1       lattice_0.20-38     broom_0.5.2        
[16] modelr_0.1.5        feather_0.3.5       digest_0.6.23       e1071_1.7-2         randomForest_4.6-14
[21] RWeka_0.4-41        rpart_4.1-13        gbm_2.1.5           fs_1.3.1            lubridate_1.7.4    
[26] janitor_1.2.0       rvest_0.3.4         xml2_1.2.2          httr_1.4.1          readxl_1.3.1       
[31] haven_2.1.1         fuzzyjoin_0.1.5     glue_1.3.1          ROracle_1.3-1       DBI_1.0.0          
[36] drake_7.8.0.9000   

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1     ellipsis_0.3.0       class_7.3-14         rprojroot_1.3-2      rstudioapi_0.10     
 [6] remotes_2.1.0        prodlim_2018.04.18   codetools_0.2-15     splines_3.5.2        pkgload_1.0.2       
[11] zeallot_0.1.0        jsonlite_1.6         rJava_0.9-11         dbplyr_1.4.2         compiler_3.5.2      
[16] keyring_1.1.0        backports_1.1.4      assertthat_0.2.1     Matrix_1.2-15        lazyeval_0.2.2      
[21] cli_1.1.0            htmltools_0.4.0      visNetwork_2.0.8     prettyunits_1.0.2    tools_3.5.2         
[26] igraph_1.2.4.1       gtable_0.3.0         reshape2_1.4.3       Rcpp_1.0.2           cellranger_1.1.0    
[31] vctrs_0.2.0          nlme_3.1-137         iterators_1.0.12     timeDate_3043.102    dbmanager_0.0.0.9001
[36] gower_0.2.1          ps_1.3.0             testthat_2.2.1       lifecycle_0.1.0      RWekajars_3.9.3-2   
[41] devtools_2.2.1       MASS_7.3-51.1        scales_1.0.0         ipred_0.9-9          hms_0.5.1           
[46] yaml_2.2.0           memoise_1.1.0        gridExtra_2.3        stringi_1.4.3        desc_1.2.0          
[51] foreach_1.4.7        filelock_1.0.2       pkgbuild_1.0.6       lava_1.6.6           storr_1.2.1         
[56] rlang_0.4.0          pkgconfig_2.0.2      htmlwidgets_1.5      recipes_0.1.7        processx_3.4.1      
[61] plyr_1.8.4           magrittr_1.5         R6_2.4.0             generics_0.0.2       base64url_1.4       
[66] txtq_0.1.6           pillar_1.4.2         withr_2.1.2          survival_2.43-3      nnet_7.3-12         
[71] crayon_1.3.4         usethis_1.5.1        grid_3.5.2           data.table_1.12.2    callr_3.3.1         
[76] ModelMetrics_1.2.2   stats4_3.5.2         munsell_0.5.0        sessioninfo_1.1.1   
wlandau commented 4 years ago

At first glance, this seemed like a cryptic bug I would need a reprex for. But the only occurrence of "if (is.na(x)) {" is here:

https://github.com/ropensci/drake/blob/a12283917fdce1f706c88bff80e55d9bcf70eaf9/R/utils.R#L295

Changing is.na() to anyNA() should be safer without much loss in speed.

library(microbenchmark)
microbenchmark(
  a = anyNA("x"),
  b = is.na("x")
)
#> Unit: nanoseconds
#>  expr min lq   mean median uq  max neval
#>     a  52 58 105.37   65.0 68 4010   100
#>     b  66 71  99.09   78.5 83 1913   100
system.time(replicate(1e5, anyNA(sample.int(1e5, 1))))
#>    user  system elapsed 
#>   0.283   0.011   0.294
system.time(replicate(1e5, is.na(sample.int(1e5, 1))))
#>    user  system elapsed 
#>   0.286   0.000   0.286

Created on 2019-12-09 by the reprex package (v0.3.0)

I will make the change. If it solves the problem, please let me know. If not, please post a traceback(). A reprex is the best thing, but probably difficult in your situation.