imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
774 stars 194 forks source link

Crush when using `extratrees` rule and large number of random splits #156

Closed hadjipantelis closed 7 years ago

hadjipantelis commented 7 years ago

Hello,

Thank you for adding the Extremely Randomised Trees routine to ranger. Unfortunately if still has some issues to iron out. For example while the example:

ranger(Species ~ ., data = iris, splitrule = "extratrees", num.random.splits = 10)

runs fine, the example:

ranger(Species ~ ., data = iris, splitrule = "extratrees", num.random.splits = 101)

causes a hard crush:

*** Error in `/usr/lib/R/bin/exec/R': double free or corruption (!prev)*** Error in `: 0x0000000002da5940 ***
/usr/lib/R/bin/exec/R': free(): invalid next size (normal): 0x000000000265ff30 ***
Aborted

My sessionInfo is as follows:

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ranger_0.6.4

loaded via a namespace (and not attached):
[1] Rcpp_0.12.9

On my system the crash occurs from num.random.splits = 44 onwards. I tried a few different seed values to see if they has any effect and it does not seem to have any, crush starts at 44 in all cases I tried always with the same free(): invalid next size (normal): message. It seems to be somewhat dataset specific. In the case of mtcars the crash consistently starts at 32, for example:

ranger(mpg ~ ., data = mtcars, splitrule = "extratrees", num.random.splits = 32, seed = 45)
*** Error in `/usr/lib/R/bin/exec/R': free(): invalid next size (normal): 0x0000000001c035f0 ***
Aborted

Maybe there should be an upper limit on the number of random splits to use proportional to sample size? Thank you for your time looking into this.

All best, Pantelis

mnwright commented 7 years ago

Thanks for the detailed report. Is was just a stupid little bug... (there is no upper limit)

hadjipantelis commented 7 years ago

Cool, thank you for the quick fix.