ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing
https://docs.ropensci.org/drake
GNU General Public License v3.0
1.34k stars 129 forks source link

CodeDepends::getInputs() fails very strangely if filter() is part of the expression #268

Closed krlmlr closed 6 years ago

krlmlr commented 6 years ago

Only a drake problem because it's now using CodeDepends. @gmbecker: Any idea what's wrong here?

CodeDepends::getInputs(quote(filter(x)))
#> Error in collector$vars: object of type 'closure' is not subsettable

Other function names seem to work fine. The x is necessary to trigger the problem.

Created on 2018-02-20 by the reprex package (v0.2.0).

wlandau commented 6 years ago

A tiny project of mine had this problem too, but then I restarted the R session and it went away. I am struggling to reproduce it.

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /[censored]/R-3.4.0/lib64/R/lib/libRblas.so
LAPACK: /[censored]/R-3.4.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
 [4] LC_COLLATE=en_US     LC_MONETARY=en_US    LC_MESSAGES=en_US
 [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] usethis_1.2.0.9000 testthat_2.0.0     devtools_1.13.4

loaded via a namespace (and not attached):
 [1] CodeDepends_0.5-5.2 compiler_3.4.0      backports_1.1.2
 [4] magrittr_1.5        R6_2.2.2            rprojroot_1.3-2
 [7] withr_2.1.1.9000    memoise_1.1.0       codetools_0.2-15
[10] digest_0.6.15       rlang_0.2.0         XML_3.98-1.10
krlmlr commented 6 years ago

It fails reproducibly on my machine, Ubuntu 17.10, R 3.4.3.

gmbecker commented 6 years ago

@krlmlr what version of CodeDepends do you have? I'm unable to recreate that with the latest github version, in either the "dplyr was loaded" or "dplyr was not laoded" cases (which have radically different semantics, as you can see below.

> library(CodeDepends)
> getInputs(quote(filter(x)))
An object of class "ScriptNodeInfo"
Slot "files":
character(0)

Slot "strings":
character(0)

Slot "libraries":
character(0)

Slot "inputs":
[1] "x"

Slot "outputs":
character(0)

Slot "updates":
character(0)

Slot "functions":
filter 
    NA 

Slot "removes":
character(0)

Slot "nsevalVars":
character(0)

Slot "sideEffects":
character(0)

Slot "code":
filter(x)

> thing = readScript(txt = "library(dplyr); filter(df, x)")
> getInputs(thing)
An object of class "ScriptInfo"
[[1]]
An object of class "ScriptNodeInfo"
Slot "files":
character(0)

Slot "strings":
character(0)

Slot "libraries":
[1] "dplyr"

Slot "inputs":
character(0)

Slot "outputs":
character(0)

Slot "updates":
character(0)

Slot "functions":
named logical(0)

Slot "removes":
character(0)

Slot "nsevalVars":
character(0)

Slot "sideEffects":
character(0)

Slot "code":
library(dplyr)

[[2]]
An object of class "ScriptNodeInfo"
Slot "files":
character(0)

Slot "strings":
character(0)

Slot "libraries":
character(0)

Slot "inputs":
[1] "df"

Slot "outputs":
character(0)

Slot "updates":
character(0)

Slot "functions":
filter 
 FALSE 

Slot "removes":
character(0)

**Slot "nsevalVars":
[1] "x"**

Slot "sideEffects":
character(0)

Slot "code":
filter(df, x)

> getInputs(quote(filter(df, x)))
An object of class "ScriptNodeInfo"
Slot "files":
character(0)

Slot "strings":
character(0)

Slot "libraries":
character(0)

**Slot "inputs":
[1] "df" "x"** 

Slot "outputs":
character(0)

Slot "updates":
character(0)

Slot "functions":
filter 
    NA 

Slot "removes":
character(0)

Slot "nsevalVars":
character(0)

Slot "sideEffects":
character(0)

Slot "code":
filter(df, x)
tiernanmartin commented 6 years ago

Something is definitely amiss with filter():

library(drake)
library(tidyverse) 

my_filter <- function(x) {
  filter(x, cyl == 4)
}

failing_plan <- drake_plan(
  new_mtcars = my_filter(mtcars)
)

make(failing_plan) # fails
## cache C:\Users\UrbanDesigner\AppData\Local\Temp\Rtmp2XZhXv\.drake
## connect 2 imports: my_filter, failing_plan
## Error in collector$vars: object of type 'closure' is not subsettable

But dplyr::filter() works:

library(drake)
library(tidyverse) 

my_other_filter <- function(x) {
  dplyr::filter(x, cyl == 4)
}

working_plan <- drake_plan(
  new_mtcars = my_other_filter(mtcars)
)

make(working_plan) # works
## cache C:\Users\UrbanDesigner\AppData\Local\Temp\Rtmpq6W0mq\.drake
## connect 2 imports: my_other_filter, working_plan
## connect 1 target: new_mtcars
## check 2 items: dplyr::filter, mtcars
## check 1 item: my_other_filter
## check 1 item: new_mtcars
## target new_mtcars 
Session info ``` r devtools::session_info() ## Session info ------------------------------------------------------------- ## setting value ## version R version 3.4.0 (2017-04-21) ## system x86_64, mingw32 ## ui RTerm ## language (EN) ## collate English_United States.1252 ## tz America/Los_Angeles ## date 2018-02-20 ## Packages ----------------------------------------------------------------- ## package * version date source ## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.2) ## backports 1.1.0 2017-05-22 CRAN (R 3.4.0) ## base * 3.4.0 2017-04-21 local ## bindr 0.1 2016-11-13 CRAN (R 3.4.2) ## bindrcpp 0.2 2017-06-17 CRAN (R 3.4.2) ## broom 0.4.3 2017-11-20 CRAN (R 3.4.3) ## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.2) ## cli 1.0.0 2017-11-05 CRAN (R 3.4.2) ## CodeDepends 0.5-3 2017-05-29 CRAN (R 3.4.3) ## codetools 0.2-15 2016-10-05 CRAN (R 3.4.0) ## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.2) ## compiler 3.4.0 2017-04-21 local ## crayon 1.3.4 2018-02-12 Github (r-lib/crayon@95b3eae) ## datasets * 3.4.0 2017-04-21 local ## devtools 1.13.2 2017-06-02 CRAN (R 3.4.0) ## digest 0.6.15 2018-01-28 CRAN (R 3.4.3) ## dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.2) ## drake * 5.0.1.9002 2018-02-21 Github (ropensci/drake@cfac8fd) ## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.3) ## forcats * 0.2.0 2017-01-23 CRAN (R 3.4.3) ## foreign 0.8-67 2016-09-13 CRAN (R 3.4.0) ## formatR 1.5 2017-04-25 CRAN (R 3.4.0) ## future 1.7.0 2018-02-11 CRAN (R 3.4.3) ## future.apply 0.1.0 2018-01-15 CRAN (R 3.4.3) ## ggplot2 * 2.2.1.9000 2017-12-02 Github (tidyverse/ggplot2@7b5c185) ## globals 0.11.0 2018-01-10 CRAN (R 3.4.3) ## glue 1.2.0.9000 2018-01-13 Github (tidyverse/glue@1592ee1) ## graphics * 3.4.0 2017-04-21 local ## grDevices * 3.4.0 2017-04-21 local ## grid 3.4.0 2017-04-21 local ## gtable 0.2.0 2016-02-26 CRAN (R 3.4.2) ## haven 1.1.0 2017-07-09 CRAN (R 3.4.2) ## hms 0.4.0 2017-11-23 CRAN (R 3.4.3) ## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0) ## htmlwidgets 0.9 2017-07-10 CRAN (R 3.4.2) ## httr 1.3.1 2017-08-20 CRAN (R 3.4.2) ## igraph 1.1.2 2017-07-21 CRAN (R 3.4.3) ## jsonlite 1.5 2017-06-01 CRAN (R 3.4.0) ## knitr 1.19 2018-01-29 CRAN (R 3.4.3) ## lattice 0.20-35 2017-03-25 CRAN (R 3.4.0) ## lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2) ## listenv 0.6.0 2015-12-28 CRAN (R 3.4.3) ## lubridate 1.7.2 2018-02-06 CRAN (R 3.4.3) ## magrittr 1.5 2014-11-22 CRAN (R 3.4.0) ## memoise 1.1.0 2017-04-21 CRAN (R 3.4.0) ## methods * 3.4.0 2017-04-21 local ## mnormt 1.5-5 2016-10-15 CRAN (R 3.4.1) ## modelr 0.1.1 2017-07-24 CRAN (R 3.4.2) ## munsell 0.4.3 2016-02-13 CRAN (R 3.4.2) ## nlme 3.1-131 2017-02-06 CRAN (R 3.4.0) ## parallel 3.4.0 2017-04-21 local ## pillar 1.1.0.9000 2018-02-12 Github (r-lib/pillar@595d1ac) ## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.2) ## plyr 1.8.4 2016-06-08 CRAN (R 3.4.2) ## psych 1.7.8 2017-09-09 CRAN (R 3.4.2) ## purrr * 0.2.4.9000 2017-12-05 Github (tidyverse/purrr@62b135a) ## R.methodsS3 1.7.1 2016-02-16 CRAN (R 3.4.1) ## R.oo 1.21.0 2016-11-01 CRAN (R 3.4.1) ## R.utils 2.6.0 2017-11-05 CRAN (R 3.4.3) ## R6 2.2.2 2017-06-17 CRAN (R 3.4.0) ## Rcpp 0.12.15 2018-01-20 CRAN (R 3.4.3) ## readr * 1.1.1 2017-05-16 CRAN (R 3.4.2) ## readxl 1.0.0 2017-04-18 CRAN (R 3.4.2) ## reshape2 1.4.2 2016-10-22 CRAN (R 3.4.2) ## rlang 0.1.6 2017-12-21 CRAN (R 3.4.3) ## rmarkdown 1.8 2017-11-17 CRAN (R 3.4.2) ## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3) ## rvest 0.3.2 2016-06-17 CRAN (R 3.4.2) ## scales 0.5.0.9000 2017-12-02 Github (hadley/scales@d767915) ## stats * 3.4.0 2017-04-21 local ## storr 1.1.3 2017-12-15 CRAN (R 3.4.3) ## stringi 1.1.6 2017-11-17 CRAN (R 3.4.2) ## stringr * 1.3.0 2018-02-19 CRAN (R 3.4.3) ## testthat 2.0.0 2017-12-13 CRAN (R 3.4.3) ## tibble * 1.4.2 2018-02-12 Github (tidyverse/tibble@80ac1a3) ## tidyr * 0.7.2.9000 2018-01-13 Github (tidyverse/tidyr@74bd48f) ## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.3) ## tools 3.4.0 2017-04-21 local ## utils * 3.4.0 2017-04-21 local ## visNetwork 2.0.3 2018-01-09 CRAN (R 3.4.3) ## withr 2.1.1.9000 2018-02-21 Github (jimhester/withr@4dfd12a) ## XML 3.98-1.9 2017-06-19 CRAN (R 3.4.1) ## xml2 1.1.1 2017-01-24 CRAN (R 3.4.2) ## yaml 2.1.14 2016-11-12 CRAN (R 3.4.0) ```
wlandau commented 6 years ago

Could CodeDepends be failing on a filter() from another package?

gmbecker commented 6 years ago

CodeDepends does not evaluate any code, even to grab the formals currently, so I'm having trouble how it possibly could be. Worst I can easily imagine happending is that it gets the NSEness of some of the args wrong if it is wrong about which filter you're hitting.

On Feb 20, 2018 8:33 PM, "Will Landau" notifications@github.com wrote:

Could CodeDepends be failing on a filter() from another package?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/drake/issues/268#issuecomment-367210684, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3dsbGV09csweRIphJjmVLMzxwNymYCks5tW5yjgaJpZM4SMq4A .

gmbecker commented 6 years ago

Do note they are using an earlier version than the latest on github though. Can anyone reproduce thr problem with the current devel version?

On Feb 20, 2018 9:35 PM, "Gabe Becker" beckerg4@gene.com wrote:

CodeDepends does not evaluate any code, even to grab the formals currently, so I'm having trouble how it possibly could be. Worst I can easily imagine happending is that it gets the NSEness of some of the args wrong if it is wrong about which filter you're hitting.

On Feb 20, 2018 8:33 PM, "Will Landau" notifications@github.com wrote:

Could CodeDepends be failing on a filter() from another package?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/drake/issues/268#issuecomment-367210684, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3dsbGV09csweRIphJjmVLMzxwNymYCks5tW5yjgaJpZM4SMq4A .

krlmlr commented 6 years ago

Works for me with the GitHub version. I wasn't aware this package is on GitHub, I looked at its DESCRIPTION only.

gmbecker commented 6 years ago

For the record I'll work in getting a new version of CodeDepends onto CRAN this week which will have this fix in it.

lorenzwalthert commented 6 years ago

I just had the same problem. After updating CodeDepends, it worked. Maybe a minimal version dependency should be declared for CodeDepends?

wlandau commented 6 years ago

What about something a less strict, maybe suggesting a minimal version in this warning message? I hesitate to force a certain version of a package in order to patch an edge case.

lorenzwalthert commented 6 years ago

I wonder though whether it had to do with filter() in particular or whether it is a problem for all name space collisions. I think one also has to weight in the complexity that is added in the drake project when considering adding informative warning messages for these kinds of issues. But I mean it's up to you what to do. Just wanted to make sure this is somehow addressed.

krlmlr commented 6 years ago

@gmbecker: Do you still plan to update CodeDepends on CRAN?

@wlandau: Requiring a minimum package version seems like a viable solution to fix a problem. Packages tend to get better, not worse. Would you be open to a "diagnosis" facility that compares installed vs. suggested package versions?

wlandau commented 6 years ago

You know what, I think you guys are right. I am seeing minimal version specs all across the tidyverse, so this is not likely to surprise people. @lorenzwalthert, what version of CodeDepends worked for you?

lorenzwalthert commented 6 years ago

I am actually not sure if we are talking about the same thing. Unfortunately, I can't reproduce what I had before updating, but I think I get a different error now if I don't use the package mask.

I.e. this works:

library("drake")
library("magrittr")
drake_plan(
  x = dplyr::filter(mtcars, cyl == 4)
) %>%
  make()
#> All targets are already up to date.
table(readd(x)$cyl)
#> 
#>  4 
#> 11

And this gives an error

library("drake")
library("magrittr")
library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:drake':
#> 
#>     contains, ends_with, everything, matches, num_range, one_of,
#>     starts_with
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
drake_plan(
  x = filter(mtcars, cyl == 4)
) %>%
  make()
#> Warning: could not resolve implicit dependencies of
#> code:structure(expression(filter(mtcars, cyl == 4)), srcfile =
#> <environment>, wholeSrcref = structure(c(1L, 0L, 2L, 0L, 0L, 0L, 1L, 2L),
#> srcfile = <environment>, class = "srcref"))
#>  ...

I also run the code from https://github.com/ropensci/drake/issues/268#issuecomment-367204566, and I get not the same error message as @tiernanmartin got, but the same as in my second example.

I think it's good practice anyways to use the mask for the name space collision so I don't object to the behaviour demonstrated, although the error message could still be more informative.

So maybe the minimal version dependency does not make sense since it is not really a huge benefit we'd get with it, just a different error message. I also tried the GitHub version, but installation fails due a problem in the NAMESPACE file so I can't make a comparison without some work. Sorry for the confusion -.-

I tried opening an issue but it's not possible with forks I just realized. @tiernanmartin maybe it is worth fixing the problem. I get:

Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo gmbecker/CodeDepends@master
* installing *source* package ‘CodeDepends’ ...
** R
Error in parse(outFile) : 
  /private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/RtmpDi5lyP/remotes6c9666522534/gmbecker-CodeDepends-d085a4f/R/codeDe:114:18: unexpected '}'
113:                    inputs <<- unique(c(inputs, 
114:                  }
                      ^
ERROR: unable to collate and parse R files for package ‘CodeDepends’
* removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/CodeDepends’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/CodeDepends’
Warning messages:
1: In utils::install.packages(pkgs = pkgs, lib = lib, repos = myrepos,  :
  installation of package ‘/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T//RtmpDi5lyP/remotes6c9666522534/gmbecker-CodeDepends-d085a4f’ had non-zero exit status
2: In utils::install.packages(pkgs = pkgs, lib = lib, repos = myrepos,  :
  installation of package ‘/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T//RtmpDi5lyP/remotes6c9666522534/gmbecker-CodeDepends-d085a4f’ had non-zero exit status

Probably another devtools::document() helps, or at least tells you where the problem is.

wlandau commented 6 years ago

Well shucks, I thought the filter() issue in CodeDepends was already taken care of.

I do agree that drake should output a more informative warning. The trouble is that the part of drake that does the code analysis is not aware of the function name or target that the code corresponds to.

gmbecker commented 6 years ago

@lorenzwalthert The canonical sources for CodeDepends are at duncantl/CodeDepends. I am the maintainer but Duncan is the original author. doing install_github from that repo works correctly without error.

@krlmlr I do plan to update CodeDepends on CRAN. I need to figure out the masking warnings by tightening up the imports, but once I do that the new version should be on CRAN shortly after that. Apologies for the delay on that.

My suspicion is that these folks are running into problems because they are using the CRAN version. I'll try to get an updated version up soon, though I'm a bit buried under the upcoming breaking changes for ggplot2 on another project right this second :-/

lorenzwalthert commented 6 years ago

Thanks @gmbecker I tried to install the fork, can't recall exactly why. But I also can't install the upstream.

> remotes::install_github("duncantl/CodeDepends")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo duncantl/CodeDepends@master
Skipping 1 packages not available: graph
ERROR: dependency ‘graph’ is not available for package ‘CodeDepends’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/CodeDepends’
Warning message:
In i.p(...) :
  installation of package ‘/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T//RtmpECpFzn/remotes18b01fdcfbad/duncantl-CodeDepends-7b398fe’ had non-zero exit status
gmbecker commented 6 years ago

You need the graph package, which is publicly available on bioconductor. (note this is true of the CRAN version, and will be true of at least one more CRAN version before it is possibly changed, it's not a function of installing from github).

In an R session do:

source("http://bioconductor.org/biocLite.R")

biocLite("graph")

and you should be all set on that front.

~G

On Thu, May 24, 2018 at 12:14 PM, Lorenz Walthert notifications@github.com wrote:

Thanks @gmbecker https://github.com/gmbecker I tried to install the fork, can't recall exactly why. But I also can't install the upstream.

remotes::install_github("duncantl/CodeDepends") Using github PAT from envvar GITHUB_PAT Downloading GitHub repo duncantl/CodeDepends@master Skipping 1 packages not available: graph ERROR: dependency ‘graph’ is not available for package ‘CodeDepends’

  • removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/CodeDepends’ Warning message: In i.p(...) : installation of package ‘/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T//RtmpECpFzn/remotes18b01fdcfbad/duncantl-CodeDepends-7b398fe’ had non-zero exit status

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/drake/issues/268#issuecomment-391827784, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3dsS3N475T6yFIYLrkWAuw1MKGAjMlks5t1waigaJpZM4SMq4A .

-- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research