mlr-org / parallelMap

R package to interface some popular parallelization backends with a unified interface
https://parallelmap.mlr-org.com
Other
57 stars 14 forks source link

could not find function "dir.exists" #37

Closed rubenohayon closed 8 years ago

rubenohayon commented 8 years ago

Hi,

I used parallelMap for Azure Machine Learning, but when I use I get this error on this function : parallelStartSocket

Thanks

berndbischl commented 8 years ago

thats from base R.

Can you please post sessionInfo() ?

rubenohayon commented 8 years ago

[ModuleOutput] [1] '2.7' [ModuleOutput] [ModuleOutput] R version 3.1.0 (2014-04-10) [ModuleOutput] [ModuleOutput] Platform: x86_64-w64-mingw32/x64 (64-bit) [ModuleOutput] [ModuleOutput] [ModuleOutput] [ModuleOutput] locale: [ModuleOutput] [ModuleOutput] [1] LC_COLLATE=English_United States.1252 [ModuleOutput] [ModuleOutput] [2] LC_CTYPE=English_United States.1252
[ModuleOutput] [ModuleOutput] [3] LC_MONETARY=English_United States.1252 [ModuleOutput] [ModuleOutput] [4] LC_NUMERIC=C
[ModuleOutput] [ModuleOutput] [5] LC_TIME=English_United States.1252
[ModuleOutput] [ModuleOutput] [ModuleOutput] [ModuleOutput] attached base packages: [ModuleOutput] [ModuleOutput] [1] splines grid parallel stats graphics grDevices utils
[ModuleOutput] [ModuleOutput] [8] datasets methods base
[ModuleOutput] [ModuleOutput] [ModuleOutput] [ModuleOutput] other attached packages: [ModuleOutput] [ModuleOutput] [1] Hmisc_3.14-4 Formula_1.1-1 survival_2.37-7 lattice_0.20-29
[ModuleOutput] [ModuleOutput] [5] mlr_2.7 ggplot2_1.0.0 parallelMap_1.3 ParamHelpers_1.6 [ModuleOutput] [ModuleOutput] [9] BBmisc_1.9 checkmate_1.7.0 Metrics_0.1.1 xgboost_0.4-2
[ModuleOutput] [ModuleOutput] [13] doParallel_1.0.10 iterators_1.0.7 foreach_1.4.2 magrittr_1.5
[ModuleOutput] [ModuleOutput] [ModuleOutput] [ModuleOutput] loaded via a namespace (and not attached): [ModuleOutput] [ModuleOutput] [1] chron_2.3-45 cluster_1.15.2 codetools_0.2-8
[ModuleOutput] [ModuleOutput] [4] colorspace_1.2-4 data.table_1.9.4 digest_0.6.4
[ModuleOutput] [ModuleOutput] [7] gtable_0.1.2 latticeExtra_0.6-26 MASS_7.3-33
[ModuleOutput] [ModuleOutput] [10] Matrix_1.1-4 munsell_0.4.2 plyr_1.8.1
[ModuleOutput] [ModuleOutput] [13] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.2
[ModuleOutput] [ModuleOutput] [16] reshape2_1.4 scales_0.2.4 stringr_0.6.2
[ModuleOutput] [ModuleOutput] [19] tools_3.1.0

berndbischl commented 8 years ago

Please also post some reproducing code + traceback()

rubenohayon commented 8 years ago

yes no problem

I used with a dataset sot I'm going to post a simple code

This is how install packages in Azure ML

install.packages("src/Metrics_0.1.1.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/checkmate_1.7.0.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/mlr_2.7.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/xgboost_0.4-2.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/BBmisc_1.9.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/ParamHelpers_1.6.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/parallelMap_1.3.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/magrittr_1.5.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/BatchJobs_1.6.zip", lib = ".", repos = NULL, verbose = TRUE) install.packages("src/doParallel_1.0.10.zip", lib = ".", repos = NULL, verbose = TRUE)

library(magrittr, lib.loc=".", verbose=TRUE) library(doParallel, lib.loc=".", verbose=TRUE) library(xgboost, lib.loc=".", verbose=TRUE) library(Metrics, lib.loc=".", verbose=TRUE) library(checkmate, lib.loc=".", verbose=TRUE) library(BBmisc, lib.loc=".", verbose=TRUE) library(ParamHelpers, lib.loc=".", verbose=TRUE) library(parallelMap, lib.loc=".", verbose=TRUE) library(mlr, lib.loc=".", verbose=TRUE)

library(doParallel, lib.loc=".", verbose=TRUE) library(Hmisc)

packageVersion("mlr")

sessionInfo()

Then I tried this simple code :

library(parallelMap) parallelStartSocket(2) # start in socket mode and create 2 processes on localhost f = function(i) i + 5 # define our job y = parallelMap(f, 1:2) # like R's Map but in parallel parallelStop() # turn parallelization off again

and i get the error on parallelStartSocket

berndbischl commented 8 years ago

I really need the output of traceback()

dir.exists is not even used in parallelMap, I just scanned the code

berndbischl commented 8 years ago

This issue here is basically the same thing https://github.com/hadley/staticdocs/issues/33

So, dir.exists was added in 3.2.0, but you have 3.1.0

but like I said: in parallelMap I dont use this. so I need to see from the traceback where this is called so I can help

mllg commented 8 years ago

Could be linked to checkmate::assertDirectory which calls dir.exists. But there is a backport in checkmate for this (https://github.com/mllg/checkmate/blob/master/R/backports.r).

This typically happens if you have multiple R versions installed on the system. E.g., if you install checkmate with R-3.2.1 but then load the package with R-3.1.0, the backported function will not exist because it was not defined during compile time.

mllg commented 8 years ago

And after a closer look, I assume you are using windows binaries. The stuff with multiple R versions is exactly what is going wrong on your setup. You install the windows binaries which where build with R-3.2.0 or higher and thus do not include the backport.

You should either upgrade R, install from CRAN or try a source installation.

berndbischl commented 8 years ago

@mllg Thx

To the point: I completely agree on what "goes wrong here". And that one should and cannot do that. Well, on a cluster, your first and last suggestion are often not possible. (just saying, as I often hate such answers from the R mailing lists :) )

But I dont get why a proper install from CRAN is not done.

mllg commented 8 years ago

Lets just hope that @rubenohayon is not running a cluster with a windows OS. :pray:

berndbischl commented 8 years ago

Well read the OP

I used parallelMap for Azure Machine Learning, b

berndbischl commented 8 years ago

I guess he is at least somewhere in the cloud

rubenohayon commented 8 years ago

Hey,

Thanks for all your answer, honestly it's really fast in the cloud so don't really need to parallelize but thank for your help :+1: