suiji / Arborist

Scalable decision tree training and inference.
Other
82 stars 14 forks source link

R version dependency #15

Closed dashaub closed 8 years ago

dashaub commented 8 years ago

Is there a technical reason why the CRAN packages require very recent R >= 3.2? Lots of users have machines that aren't updated frequently. If this is for development ease or lack of testers on other releases, I have several machines on different releases and would be glad to do compatibility testing. Additionally, automatic builds with Travis on R:devel, R:release, and R:oldrelease can be setup.

suiji commented 8 years ago

It may not be necessary to restrict to versions as recent as 3.2. As you suggest, it was mainly a dearth of testing resources that led to the restriction.

If you have the time and energy to try some older versions, I would appreciate the help. The version currently hosted on Github is almost ready for release to CRAN. The only gating issue is the size of the shared object. If you are able to do some compatibility testing in the near term, I can definitely postpone the release for a time.

Thanks, mls

dashaub commented 8 years ago

Great, I'll do some tests on different versions and let you know the results.

suiji commented 8 years ago

Thank you.

Please pick up the latest changes, as a crucial header file had not been revised on the main branch.

dashaub commented 8 years ago

Main branch == CRAN release? I only see the master branch on Github so I assume that's what you mean. I was going to fork this repo, change the R version requirement in DESCRIPTION and try devtools::isntall_github(dashaub/Arborist) as well as running R CMD CHECK on the package.

suiji commented 8 years ago

Main branch == CRAN release? I only see the master branch on Github so I assume that's what you mean. I was going to fork this repo, change the R version requirement in DESCRIPTION and try devtools::isntall_github(dashaub/Arborist) as well as running R CMD CHECK on the package.

I should have said "master branch". It should be safe to fork now that the header has been uploaded.

Thanks, mls

dashaub commented 8 years ago

I succesfully installed on R 3.1.1 on Debian 8.

> library(Rborist)
Loading required package: Rcpp
Rborist 0.1-2
Type RboristNews() to see new features/changes/bug fixes.
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rborist_0.1-2 Rcpp_0.12.5  

loaded via a namespace (and not attached):
[1] tools_3.1.1

My process was to clone your repo, run dev.sh and then Rborist.CRAN.sh I then tried checking the project using devtools::check(document = FALSE) in RStudio. There were a couple of warnings and notes that don't seem specific to an older R version (LICENSE in the src/ directory, enhances package "forestFloor" not available, and large subdirectories).

I built the package and ran R CMD check from the terminal. Aside from the Latex errors at the end while building the PDF manuals--I don't have the necessary packages installed--there weren't other errors checking, and the package tests passed successfully.

R CMD check Rborist_0.1-2.tar.gz 
* using log directory ‘/home/david/Downloads/Arborist/ArboristBridgeR/Package/Rborist.Rcheck’
* using R version 3.1.1 (2014-07-10)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘Rborist/DESCRIPTION’ ... OK
* this is package ‘Rborist’ version ‘0.1-2’
* checking package namespace information ... OK
* checking package dependencies ... NOTE
Package which this enhances but not available for checking: ‘forestFloor’
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘Rborist’ can be installed ... OK
* checking installed package size ... NOTE
  installed size is  5.0Mb
  sub-directories of 1Mb or more:
    libs   4.9Mb
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking compiled code ... OK
* checking examples ... OK
* checking for unstated dependencies in tests ... OK
* checking tests ...
  Running ‘testthat.R’
 OK
* checking PDF version of manual ... WARNING
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! LaTeX Error: File `upquote.sty' not found.

Type X to quit or <RETURN> to proceed,
or enter new name. (Default extension: sty)

! Emergency stop.
<read *> 

l.8 \begin{document}
                    ^^M
!  ==> Fatal error occurred, no output PDF file produced!
* checking PDF version of manual without hyperrefs or index ... OK

WARNING: There was 1 warning.
NOTE: There were 2 notes.
See
  ‘/home/david/Downloads/Arborist/ArboristBridgeR/Package/Rborist.Rcheck/00check.log’
for details.

I built the package and it installed successfully from this tar.gz, and I ran a few models. The performance looks a little slower than I remember, granted this is on a slow machine using 4 cores. Does this speedup ratio look comparable to what you see on your machine?

> library(randomForest)
> library(Rborist)
> library(microbenchmark)
> library(kernlab)
> 
> data(spam)
> xData <- spam
> yData <- spam$type
> xData$type <- NULL
> microbenchmark(Rborist(xData, yData),
+                randomForest(xData,yData),
+                times = 9)
Unit: seconds
                       expr       min        lq      mean    median
      Rborist(xData, yData)  8.119039  8.256664  9.183884  9.467035
 randomForest(xData, yData) 22.873831 22.895516 23.492166 23.375539
        uq      max neval cld
  9.802275 10.52704     9  a 
 23.809439 25.16151     9   b

In any case, you might want to consider looking into using continuous integration for your builds. It can make preparing for CRAN releases a lot easier, especially for this case of supporting multiple versions. I can try to put together a start to this if you'd like. This setup with a couple of build scripts is not a standard template example, but it can certainly be achieved with a customized .travis.yml file.

suiji commented 8 years ago

The performance looks a little slower than I remember, granted this is on a slow machine using 4 cores. Does this speedup ratio look comparable to what you see on your machine?

I have not tried it yet, but that ratio does not sound bad: Spam is a classification set with only a few thousand rows.

In any case, you might want to consider looking into using continuous integration for your builds. It can make preparing for CRAN releases a lot easier, especially for this case of supporting multiple versions. I can try to put together a start to this if you'd like.

Yes, continuous integration would be a worthy goal. Certainly, if you have time to help improve automation of the project, then your participation is welcome.

This setup with a couple of build scripts is not a standard template example, but it can certainly be achieved with a customized .travis.yml file.

A double-script approach is admittedly awkward. So far, though, I have not come up with a better way to organize the code in a front- and back-end-neutral fashion. The goal is to host multiple front ends and support multiple hardware targets.

In the meantime, I will wind the version requirement back to 3.1. Thank you for your help.

dashaub commented 8 years ago

This .yml in PR #17 uses the Rborist.CRAN.sh script and then undoes the tar creation before testing/building/installing the package. We could get around this duplication by creating another script (e.g. prepareTravis.sh), but I'm guessing you don't want yet another build script there, so this seems like an acceptable solution to me.

suiji commented 8 years ago

Thanks very much. I appreciate the contribution and feedback.

dashaub commented 8 years ago

On a similar note, I'm guessing the gcc >= 4.5 truly is needed; the build failed on a machine without this.

suiji commented 8 years ago

Any idea what the failure mode is?

dashaub commented 8 years ago
gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

R CMD INSTALL Rborist_0.1-2.tar.gz
* installing to library ‘/users/mis/shaudav/R/x86_64-redhat-linux-gnu-library/3.1’
* installing *source* package ‘Rborist’ ...
** libs
g++ -m64 -std=c++0x -I/usr/include/R -DNDEBUG  -I/usr/local/include -I"/users/mis/shaudav/R/x86_64-redhat-linux-gnu-library/3.1/Rcpp/include" -I"/users/mis/shaudav/R/x86_64-redhat-linux-gnu-library/3.1/RcppArmadillo/include"  -fopenmp -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -c bottom.cc -o bottom.o
In file included from bottom.cc:16:
bottom.h:411: error: ISO C++ forbids declaration of ‘constexpr’ with no type
bottom.h:411: error: expected ‘;’ before ‘unsigned’
bottom.h:419: error: ISO C++ forbids declaration of ‘constexpr’ with no type
bottom.h:419: error: expected ‘;’ before ‘double’
In file included from bottom.cc:17:
bv.h:32: error: ISO C++ forbids declaration of ‘constexpr’ with no type
bv.h:32: error: expected ‘;’ before ‘unsigned’
bv.h: In static member function ‘static unsigned int BV::SlotElts()’:
bv.h:65: error: ‘slotElts’ was not declared in this scope
bv.h: In static member function ‘static unsigned int BV::SlotAlign(unsigned int)’:
bv.h:80: error: ‘slotElts’ was not declared in this scope
bv.h: In static member function ‘static unsigned int BV::Stride(unsigned int)’:
bv.h:84: error: ‘slotElts’ was not declared in this scope
bv.h: In static member function ‘static unsigned int BV::SlotMask(unsigned int, unsigned int&)’:
bv.h:91: error: ‘slotElts’ was not declared in this scope
bv.h: In member function ‘void BV::SetBit(unsigned int, bool)’:
bv.h:130: error: ‘slotElts’ was not declared in this scope
In file included from bottom.cc:17:
bv.h: In member function ‘BitRow* BitMatrix::Row(unsigned int)’:
bv.h:176: error: ‘slotElts’ was not declared in this scope
bv.h: At global scope:
bv.h:249: error: ISO C++ forbids declaration of ‘constexpr’ with no type
bv.h:249: error: expected ‘;’ before ‘unsigned’
bv.h: In static member function ‘static unsigned int CharV::SlotElts()’:
bv.h:269: error: ‘slotElts’ was not declared in this scope
bv.h: In static member function ‘static unsigned int CharV::SlotAlign(unsigned int)’:
bv.h:286: error: ‘slotElts’ was not declared in this scope
bv.h: In static member function ‘static unsigned int CharV::Stride(unsigned int)’:
bv.h:290: error: ‘slotElts’ was not declared in this scope
bv.h: In member function ‘unsigned char CharV::Get(unsigned int) const’:
bv.h:295: error: ‘slotElts’ was not declared in this scope
bv.h: In member function ‘void CharV::Set(unsigned int, unsigned char)’:
bv.h:311: error: ‘slotElts’ was not declared in this scope
bv.h: In member function ‘CharRow* CharMatrix::Row(unsigned int)’:
bv.h:349: error: ‘slotElts’ was not declared in this scope
In file included from bottom.cc:19:
splitpred.h: At global scope:
splitpred.h:115: error: ISO C++ forbids declaration of ‘constexpr’ with no type
splitpred.h:115: error: expected ‘;’ before ‘double’
splitpred.h:116: error: ISO C++ forbids declaration of ‘constexpr’ with no type
splitpred.h:116: error: expected ‘;’ before ‘double’
splitpred.h:117: error: ISO C++ forbids declaration of ‘constexpr’ with no type
splitpred.h:117: error: expected ‘;’ before ‘double’
bottom.cc: In member function ‘const std::vector<SSNode*, std::allocator<SSNode*> > Bottom::Split(Index*, IndexNode*)’:
bottom.cc:105: error: ‘pathMax’ was not declared in this scope
bottom.cc: In member function ‘void Bottom::FlushRear()’:
bottom.cc:141: error: ‘pathMax’ was not declared in this scope
bottom.cc: In destructor ‘Bottom::~Bottom()’:
bottom.cc:236: error: expected initializer before ‘:’ token
bottom.cc:240: error: could not convert ‘((Bottom*)this)->Bottom::level.std::deque<_Tp, _Alloc>::clear [with _Tp = Level*, _Alloc = std::allocator<Level*>]()’ to ‘bool’
bottom.cc:242: error: expected ‘)’ before ‘;’ token
bottom.cc: In member function ‘void Bottom::Restage(RestageCoord&)’:
bottom.cc:366: error: ‘pathMax’ was not declared in this scope
bottom.cc:370: error: ‘reachOffset’ was not declared in this scope
make: *** [bottom.o] Error 1
ERROR: compilation failed for package ‘Rborist’
* removing ‘/users/mis/shaudav/R/x86_64-redhat-linux-gnu-library/3.1/Rborist’
dashaub commented 8 years ago

Don't have access to it, now, but this also failed on Debian 8 with gcc 4.9

dashaub commented 8 years ago

Correction, it DOESN'T fail on Debian 8 with gcc 4.9. It does appear that the very old gcc 4.4.7 is problematic, however.

suiji commented 8 years ago

The 4.4.7 errors look like incompatibility with C++-11. It's reassuring that 4.9 does not complain.

Out of curiosity: Are you testing on sandboxed environments or are these results all from native installations?

dashaub commented 8 years ago

These are bare metal installations.

suiji commented 8 years ago

I've set the gcc requirement to ">= 4.9", based on these results.

The package looks ready for submission to CRAN, but we can wait a bit longer if you think it is worth testing the 4.5 - 4.8 range.

dashaub commented 8 years ago

Incidentally I believe I have a gcc 4.8 build that I can test out. I'll verify this.

dashaub commented 8 years ago

These last couple of posts were from memory, and I forgot to include the major release number 4 in those version numbers. Sorry about the confusion for the gcc versions.

For clarity, here is a summary of what works and what doesn't:

Works: Debian 8 with gcc 4.4.9 RHEL 6.7 with gcc 4.8.4

Doesn't work: RHEL 6.7 with gcc 4.4.7

It would be nice to test Ubuntu 14.04 since it uses gcc 4.4.8 and that would support a bunch of additional users. Unfortunately I don't have an install of this so I might have to look at using a VM. For now it seems 4.4.9 is a safe minimum gcc requirement.

I don't know enough about C++ and the features you're using in the package, but maybe you can get an idea here about which versions are likely the minimum required from here. https://gcc.gnu.org/projects/cxx-status.html

Finally, I believe the CRAN SystemRequirements field doesn't actually do anything to prevent the package installation but merely serves as useful information to the user, so setting this to the lowest known supported version would be fine, and if users have an older version and that version can compile correctly, the installation will succeed, but if it fails they can look at the SystemRequirements and know why.

suiji commented 8 years ago

I should be able to slip a revision into the CRAN release reflecting your findings. There was a build failure on Solaris x86, so at least one revision will be required to satisfy the various builds.

Thanks for your help.

dashaub commented 8 years ago

Ok, sorry for the supreme confusion on this. The gcc version on Debian 8 is 4.9 as originally stated. I really need to only post version numbers when I'm in front of that particular computer. So confirmed working is gcc >= 4.9.

I tried an install on Debian 7 with R 2.14 but unsurprisingly it failed ("Rcpp" won't install on that old of a version of R). I tried a couple of Ubuntu 14.04 images but couldn't get those working. The Travis CI build system uses Ubuntu 12.04 with gcc 4.6.3, so we'll find out soon if that version is good enough.

On a separate note, it looks like the package "roxygen2" needs to be added to the install procedure in the .travis.yml file or set devtools::check("Rborist", manual = FALSE) for the Travis CI to build to proceed.

suiji commented 8 years ago

O.k. 4.9 is in the current DESCRIPTION, and will make it onto CRAN. CRAN asks contributors to refrain from submitting corrections until the test cycle completes. Testing is not complete, so there is time to change the version bound.

In the meantime, I will revise the Travis script. Stay tuned.

dashaub commented 8 years ago

It's strange that devtools::check() is still complaining about "roxygen2" not being installed since all it should be used for is building the manual. Maybe try adding the install of roxygen2 to the previous step to appease it.

dashaub commented 8 years ago

Something else to try: changing the build system to use Ubuntu 14.04. Simply add dist: trusty to the .travis.yml file.

suiji commented 8 years ago

Explicitly install the 'roxygen2' package did the trick. More interesting failure modes now appear.

dashaub commented 8 years ago

Great, that is a (better) problem to have. Excellent package by the way.