hadley / plyr

A R package for splitting, applying and combining large problems into simpler problems
plyr.had.co.nz
Other
495 stars 116 forks source link

Ddply parallel not working (related to issue #204) #271

Open buggythepirate opened 8 years ago

buggythepirate commented 8 years ago

Hi there,

I have a problem with running ddply in parallel that is similar to issue #204 . The issue was closed by Hadley. However, it seems that the problem is still at large. Using Hadley's reproducible example, ddply still results in an error.

Any help is much appreciated.

Example:

library(plyr)
dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54))

library(doSNOW)
cl<-makeCluster(3, type="SOCK")
registerDoSNOW(cl)
ddply(dfx, .(group, sex), .parallel=T, .fun=summarize, 
  mean = round(mean(age), 2), sd = round(sd(age), 2))

Error:

Error in do.ply(i) : task 1 failed - "'...' in falschem Kontext benutzt"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

System:

HenrikBengtsson commented 8 years ago

If of any help, I can reproduce this on R 3.3.0 and Windows (details below). The registered SNOW cluster does seem to work with foreach itself;

> foreach (i=1:3) %dopar% Sys.getpid()
[[1]]
[1] 6584

[[2]]
[1] 10964

[[3]]
[1] 6376
> sessionInfo()
R version 3.3.0 Patched (2016-05-03 r70575)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] doSNOW_1.0.14   snow_0.4-1      iterators_1.0.8 foreach_1.4.3
[5] plyr_1.8.3

loaded via a namespace (and not attached):
[1] compiler_3.3.0   Rcpp_0.12.4.5    codetools_0.2-14
HenrikBengtsson commented 8 years ago

Same issue when using doParallel:

> library(plyr)
> dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
    sex = sample(c("M", "F"), size = 29, replace = TRUE),
    age = runif(n = 29, min = 18, max = 54))

> library(doParallel)
> cl <- makeCluster(4)
> registerDoParallel(cl)
> getDoParName()
[1] "doParallelSNOW"

> ddply(dfx, .(group, sex), .parallel=T, .fun=summarize, 
    mean = round(mean(age), 2), sd = round(sd(age), 2))
Error in do.ply(i) : task 1 failed - "'...' used in an incorrect context"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'

2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'

> registerDoSEQ()
> ddply(dfx, .(group, sex), .parallel=T, .fun=summarize,
+     mean = round(mean(age), 2), sd = round(sd(age), 2))
  group sex  mean    sd
1     A   F 37.31  9.83
2     A   M 22.92  0.85
3     B   F 28.52  9.28
4     B   M 39.97 11.36
5     C   F 44.25  1.53
6     C   M 28.95  6.99
Warning message:
In setup_parallel() : No parallel backend registered

> sessionInfo()
R version 3.3.0 Patched (2016-05-03 r70575)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3     plyr_1.8.3

loaded via a namespace (and not attached):
[1] compiler_3.3.0   Rcpp_0.12.4.5    codetools_0.2-14