suiji / Arborist

Scalable decision tree training and inference.
Other
82 stars 14 forks source link

Rborist hard crash under caret #14

Closed tobigithub closed 8 years ago

tobigithub commented 8 years ago

"R for Windows has stopped working" issue 36: hard crash under win issue 31: hard crash with iris set

Tobias

suiji commented 8 years ago

Thank you for providing test code.

The Caret team had reported some problems with Rborist 0.1-1 when they provided initial support. These are repaired in the current Github version, which is not yet on CRAN. It may be that the version on Github repairs your problem already. In the meantime, I will try to reproduce the problem.

Regards, mls

tobigithub commented 8 years ago

Thank you. Tobias

suiji commented 8 years ago

The version on Github, so far, does not reproduce the problem. This is under a Linux environment running 3.3.1.

Do you know roughly where in the parameter space the problem occurs?

Thank you, mls

tobigithub commented 8 years ago

Hi, just reproduced the crashes under Windows, all other 100 R packages that are working under caret are fine. However I have seen weird dependencies, also related to DLL hell, the current WIN R3.3.1 package I use has 821 packages and 46,049 Files in 8,459 Folders.

rborist-hard-crash-win

I also removed the parallel code it still crashes. It also could be related to the types of data or missing values (NA) or caret itself. Next step would be to run the example just with rf and Rborist installed on a clean R system using a virtual machine.

I also run some very simple examples like this, Rborist alone runs fine.

But it still crashes under WIN and R3.3.1 once caret::train is invoked. Basically it could be related to caret itself, or related to many of the other dependencies.

require(Rborist)

# works
nRow <- 500
x <- data.frame(replicate(6, rnorm(nRow)))
y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.
rb <- Rborist(x,y)

require(caret);
# works 
caret::train(x,y,"knn")

# crashes
caret::train(x,y,"Rborist") 
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rborist_0.1-1   Rcpp_0.12.5     caret_6.0-70    ggplot2_2.1.0   lattice_0.20-33

loaded via a namespace (and not attached):
 [1] magrittr_1.5       splines_3.3.1      MASS_7.3-45        munsell_0.4.3      colorspace_1.2-6   foreach_1.4.3     
 [7] minqa_1.2.4        stringr_1.0.0      car_2.1-2          plyr_1.8.4         tools_3.3.1        nnet_7.3-12       
[13] pbkrtest_0.4-6     parallel_3.3.1     grid_3.3.1         gtable_0.2.0       nlme_3.1-128       mgcv_1.8-12       
[19] quantreg_5.26      MatrixModels_0.4-1 iterators_1.0.8    lme4_1.1-12        Matrix_1.2-6       nloptr_1.0.4      
[25] reshape2_1.4.1     codetools_0.2-14   stringi_1.1.1      scales_0.4.0       stats4_3.3.1       SparseM_1.7       
> 
suiji commented 8 years ago

Have you tried the version on Github, or are you still testing with the CRAN version? The older version, currently on CRAN, would sometimes fail when validating small forests. I believe that forest size is one of the parameters caret varies by default, so running caret under default settings would guarantee failure with the older version.

Note: I am about to revise the version number of the GH version. It still bears the 0-1.1 designation, which should have been updated to 0-1.2. This should eliminate some confusion.

Regards, mls

tobigithub commented 8 years ago

Hi, I still use the old version, the GithUb version does not contain DLLs (is not precompiled), so I can not use it. I will wait until the development version has been posted. Thanks. Tobias

suiji commented 8 years ago

The advice people have been giving lately has been to issue more frequent releases, with fewer enhancements per release. The next release is waiting on a rewrite of the lazy restaging pass, as well as an initial implementation of a sparse internal representation. In the interest of getting a working release out more quickly, it probably makes sense to finish the rewrite for 0-1.2, while delaying the sparse representation until 0-1.3. This should enable you to pick up something useful a bit sooner, hopefully in the next few weeks.

Regards, mls

suiji commented 8 years ago

Since the problem appears to be similar to an issue reported earlier and does not seem to reproduce, the issue will be closed for the time being. Please feel free to reopen, though, if it reappears in 0-1.2.

suiji commented 8 years ago

Version 0.1-3 is available from CRAN now. It should address this issue.

tobigithub commented 8 years ago

Thank you. Tobias