tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

Problem with copying custom function to cluster #93

Closed willtudorevans closed 4 years ago

willtudorevans commented 4 years ago

I originally posted this problem on stack overflow but I now realise here would probably have been more appropriate.

I'm struggling to get the version 0.0.0.9000 of multidplyr to work with a custom function.

By way of reproducible example, here is what I've tried:

library(multidplyr)
library(dplyr)
cl <- new_cluster(3)
df <- data.frame(Grp = rep(LETTERS[1:3], each = 4), Val = rep(3:1, 4))

cust_func <- function (x) {
  x + 1
}

cluster_copy(cl, "cust_func")

df_clust <- df %>%
  group_by(Grp) %>%
  partition(cl) 

df_clust %>%
  mutate(Add1 = cust_func(Val)) %>%
  collect()

I end up getting a Computation failed error. I have tried different ordering and a few other minor variations but no luck.

I feel I must be doing something wrong as it seems it was possible to export custom functions in previous versions of multidplyr. What am I doing wrong?

xiaokunx commented 4 years ago

The example works fine on Mac, with the following setup info:

R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.7.0 (64-bit)

> devtools::install_github("tidyverse/multidplyr")
Downloading GitHub repo tidyverse/multidplyr@master
✔  checking for file ‘/private/var/folders/k_/z2145fbs0s32tt9llhmw1m0c9bw84h/T/Rtmp4ch5CM/remotes111841388a015/tidyverse-multidplyr-03bf5c4/DESCRIPTION’ ...
─  preparing ‘multidplyr’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘multidplyr_0.0.0.9000.tar.gz’
   Warning: invalid uid value replaced by that for user 'nobody'
   Warning: invalid gid value replaced by that for user 'nobody'

Installing package into ‘/usr/local/lib/R/3.6/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘multidplyr’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (multidplyr)

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.7/lib/libopenblasp-r0.3.7.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] nycflights13_1.0.0    dplyr_0.8.5           multidplyr_0.0.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3          compiler_3.6.0      pillar_1.4.3
 [4] prettyunits_1.0.2   remotes_2.1.0       tools_3.6.0
 [7] qs_0.21.1           testthat_2.1.1      digest_0.6.25
[10] pkgbuild_1.0.3      pkgload_1.0.2       memoise_1.1.0
[13] tibble_2.1.3        pkgconfig_2.0.3     rlang_0.4.5
[16] cli_2.0.2           curl_4.0            withr_2.1.2
[19] desc_1.2.0          fs_1.3.1            vctrs_0.2.4
[22] devtools_2.2.1      rprojroot_1.3-2     tidyselect_1.0.0
[25] glue_1.3.1          RApiSerialize_0.1.0 R6_2.4.1
[28] processx_3.4.2      fansi_0.4.1         sessioninfo_1.1.1
[31] callr_3.4.2         purrr_0.3.3         magrittr_1.5
[34] backports_1.1.5     ps_1.3.2            ellipsis_0.3.0
[37] usethis_1.5.0       assertthat_0.2.1    utf8_1.1.4
[40] crayon_1.3.4
willtudorevans commented 4 years ago

Thank you for @xiaokunx for looking into this.

I've retried this now without any issues. I can't understand it!