tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.79k stars 2.12k forks source link

dplyr 0.5.0.9002 prevents CRAN version of sparklyr from loading (4/15/2017, 0.5.3 ) #2670

Closed JohnMount closed 7 years ago

JohnMount commented 7 years ago

With the dev version of dplyr installed (4/15/2017, 0.5.0.9002 commit d7d2f10167b4ac919e876eb9a891fd53345be985) the CRAN version of sparklyr (4/15/2017, 0.5.3 ) will not load:

library("sparklyr")
# Error : object 'sql_build' not found whilst loading namespace 'sparklyr'
# Error: package or namespace load failed for ‘sparklyr’

Problem goes away with dev-version of sparklyr (4/15/2017, 0.5.3-9005 commit 58fcd949d7709b4be44e2789a1c5355a6bd148f3).

hadley commented 7 years ago

@javierluraschi we should chat on Monday about how to resolve this?

hadley commented 7 years ago

I think instead of

#' @importFrom dbplyr sql_build

You should be able to write

#' @rawNamespace
#' if (utils::packageVersion("dplyr") > "0.5.0") {
#'   importFrom("dbplyr", "sql_build")
#' } else {
#'   importFrom("dplyr", "sql_build")
#' }
javierluraschi commented 7 years ago

@hadley when @kevinushey bumped the required versions https://github.com/rstudio/sparklyr/commit/58fcd949d7709b4be44e2789a1c5355a6bd148f3 it also fixed this issue.

We are not planning to support dplyr 0.5 in sparklyr 0.6; but we can definitely reconsider this if you feel strongly about supporting both versions.

hadley commented 7 years ago

I think you should at least consider it — it's a bit more work but is easier on CRAN and on your users. See http://dplyr.tidyverse.org/articles/compatibility.html for details.

javierluraschi commented 7 years ago

Progress: https://github.com/rstudio/sparklyr/commits/bugfix/hotfix-0.5.4

However, with the github version of dplyr and dbplyr I get this error:

* checking package dependencies ... ERROR
Namespace dependency not required: ‘dbplyr’

See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’ manual.

which I can fix by adding dbplyr as an import to the DESCRIPTION. However, that breaks when dbplyr is not installed against the dplyr CRAN version:

* installing to library ‘/Users/javierluraschi/Library/R/3.3/library’
ERROR: dependency ‘dbplyr’ is not available for package ‘sparklyr’

@hadley @lionel-: Is there a known workaround for this? Perhaps we can submit dbplyr to CRAN as an interface prior to submitting dplyr to CRAN?

hadley commented 7 years ago

Yeah, I realised I'd forgotten about that problem last night.

I've had a quick look to see if I can make dbplyr work with dplyr 0.5.0:

hadley commented 7 years ago

Hmmm, but what's causing checking package dependencies ... ERROR. I don't see any unqualified references to dbplyr in your NAMESPACE.

...

Oh the root cause is that we have no way to conditionally import dbplyr in the description 😞

hadley commented 7 years ago

Maybe I could make a version of dbplyr that passes R CMD check with dplyr 0.5.0, but only actually works with dplyr 0.6.0.

hadley commented 7 years ago

Another option would be for sparklyr to suggest dbplyr, instead of importing it.

To avoid R CMD check problems, you'd then not import the generics, and instead of registering S3 methods in the namespace, you'd do it manually in .onLoad().

If you're game, let's try that route first. I'm happy to help out with the implementation.

hadley commented 7 years ago

This function will register an S3 method - it needs to be called from .onLoad().

register_s3_method <- function(pkg, generic, class, fun = NULL) {
  stopifnot(is.character(pkg), length(pkg) == 1)
  envir <- asNamespace(pkg)

  stopifnot(is.character(generic), length(generic) == 1)
  stopifnot(is.character(class), length(class) == 1)
  if (is.null(fun)) {
    fun <- get(paste0(generic, class), envir = parent.frame())
  } else {
    stopifnot(is.function(fun))  
  }

  if (pkg %in% loadedNamespaces()) {
    registerS3method(generic, class, fun, envir = envir)
  }

  # Always register hook in case package is later unloaded & reloaded
  setHook(
    packageEvent(pkg, "onLoad"),
    function(...) {
      registerS3method(generic, class, fun, envir = envir)
    }
  )
}
javierluraschi commented 7 years ago

Sounds out to me, giving it a shot...

javierluraschi commented 7 years ago

@hadley Changes performed; however, I also had to remove dbplyr from suggests:. Otherwise, I would hit the following --as-cran error.

@hadley Any objections to submitting this patch to CRAN? My guess is that you are not planning any more signature changes at this point, correct?

Here are the last set of changes: https://github.com/rstudio/sparklyr/commits/bugfix/hotfix-0.5.4 and the CRAN patch release file: https://github.com/rstudio/sparklyr/releases/tag/v0.5.4

* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Javier Luraschi <javier@rstudio.com>’

Suggests or Enhances not in mainstream repositories:
  dbplyr
* checking package namespace information ... OK
* checking package dependencies ... ERROR
Package suggested but not available: ‘dbplyr’

The suggested packages are required for a complete check.
Checking can be attempted without them by setting the environment
variable _R_CHECK_FORCE_SUGGESTS_ to a false value.

See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’
manual.
* DONE

Status: 1 ERROR, 1 NOTE
hadley commented 7 years ago

Just set _R_CHECK_FORCE_SUGGESTS_ to false ;) (That's ok with CRAN if you explain why)

javierluraschi commented 7 years ago

@hadley I'm hitting this issue, any thoughts? Here are the repro steps:

  1. Uninstall dplyr, dbplyr, sparklyr, etc.
  2. Install dplyr from CRAN
  3. Install sparklyr patch from https://github.com/rstudio/sparklyr/releases/tag/v0.5.4
  4. Notice that library(sparklyr) works fine
  5. Install dplyr, dbplyr and rlang from Github
  6. library(sparklyr) fails with the following error:
> library(sparklyr)
Error : object ‘src_desc’ is not exported by 'namespace:dplyr'
Error: package or namespace load failed for ‘sparklyr’

I thought this might be caused by setHook from register_s3_method but that's not the case. The only way I've found to fix this is to uninstall sparklyr and install back from https://github.com/rstudio/sparklyr/releases/tag/v0.5.4 (or rebuild which probably has the same side-effect).

Any thoughts on this?

kevinushey commented 7 years ago

My best guess is that the NAMESPACE is parsed and used at package build time, and so some metadata related to its use is populated there. This implies that if sparklyr is built against the CRAN version of dplyr, the NAMESPACE bits re: using src_desc will kick in and that will fail when running against the development version of dplyr.

I'm not sure if there's a clean way around this, unless the namespace import stuff can be done at package load time rather than build time.

kevinushey commented 7 years ago

@hadley: are you aware of any tricks for sidestepping the NAMESPACE registration system altogether, and just automagically handling import + export at load time?

javierluraschi commented 7 years ago

@hadley @kevinushey The bug I mentioned is fixed by removing all importFrom in the NAMESPACE file (and loading this instead dynamically as Hadley pointed out in https://github.com/rstudio/sparklyr/commit/1f9f1e9b003a0c02355dba02e330bc59a7a446fb); must mention I don't fully understand how the NAMESPACE file work but happy to see this problem resolved.

Looks to me like this patch is good for CRAN, will test a bit more today just in case.

hadley commented 7 years ago

I think the importFrom are only necessary so that R CMD check finds the generics associated with the S3methods so it can check that the arguments match. Since we're now exporting the methods at load time, there's no need to import the generics at build time.

kevinushey commented 7 years ago

That sounds right to me as well. I'm surprised that R CMD check lets us get away with this I'll take what we can get :)

javierluraschi commented 7 years ago

sparklyr 0.5.4 which supports dplyr ~0.6 is on CRAN now: https://cran.r-project.org/web/packages/sparklyr/index.html, retrying with 0.5.4 will fix the sql_build issue.

@hadley sparklyr should be now all set for the dplyr 0.6 release.