Open thirdwing opened 8 years ago
Yes, I am looking at some examples and tests in MLPACK and it is a little sad that they don't work in RcppMLPACK -- ie I just hit upon StandardCoverTree
and that is apparently 2.0.* only.
This can be fixed by embedding the boost::serialization
source code.
I will find some time to upgrade this.
That's because boost_serialization
requires header + library rather than just templates?
I don't think it is header-only.
I think we are saying the same thing.
Thinking out loud here: could we maybe informally make this work by relying on BH and 'just' adding the Boost Serialization parts this needs? Burden would then be on us to keep this in sync with the (infrequent) BH updates.
Alternatively, one could depend on the system Boost installation but that may exclude Windows and other platforms. Do you get easily installable and reliable Boost libraries from (home)brew (or alternatives) on OS X?
On OSX it is quite easy to install boost, but it is quite annoying on Windows.
I prefer using the BH package and embedding the boost.serialization part.
I just thought about another way: use an external MLPACK. On Ubuntu 16.04, I got a version 2.0.1 preinstalled. I am now working in (local) branch where I
inst/include/mlpack/
src/mlpack/
src/Makevars
, added linking to -lmlpack
src/kmeans.cpp
(as it now wants a Row
rather than Col
and it all builds. You'd probably loose Windows, or possibly do one of those tricks on pulling in a pre-build static library. Not sure about OS X.
Also possible cans of worms with version drift between Armadillo and MLPACK.
FWIW I pushed this to a public branch: https://github.com/eddelbuettel/RcppMLPACK/tree/feature/external-mlpack
I may experiment with calling this RcppMLPACKlite to signal that it uses an external MLPACK. Not sure if I find time to work on it though ...
I am not sure if there is anything I can do to help here, and I am certain I don't 100% understand the environment in which you are working, but if you wanted to strip serialization support easily, you could replace CreateNVP
in serialization_shim.hpp
with a function that does nothing, and then remove the overloads of data::Load()
and data::Save()
in src/mlpack/core/data/load.hpp
and src/mlpack/core/data/save.hpp
that take a model (and therefore use boost::serialization), then remove everything related to serialization from CMakeLists.txt. I am not sure if that's a desirable solution---or even a solution that does what you need---but I thought I would toss it out there since I'm familiar with how that code is written. :)
Hi @rcurtin -- good to see you here.
This may have legs. Let's talk more. For starters:
#ifdef
-ing save and load makes a lot of sense in that case too as we have R as a 'REPL' around it to load and save data One thing which Conrad very kindly accomodated was the override of std::cout
. R does not like us to use native I/O (but rather use its, so that it is nicely synchronize). Similarly assert()
and exit()
are verboten (which makes complete sense when you consider that R is the shell and environment).
Does this make sense? I would love to work on this to get MLPACK more tightly into R.
I'm happy to help out how I can without compromising our own design. I think that we can work easily within those confines to make life easier for you.
I looked at Cereal, but I chose to go with Boost since we were already requiring other parts of boost anyway. I believe the serialization shim (made so that we can write Serialize()
functions instead of serialize()
and conform with our style guidelines) was easier with Boost too; it's been a while and I can't remember for sure.
I'm not so excited about #ifdef
'ing all of the Serialize()
functions; is there any other way? If you are basically just taking the mlpack headers and copying/modifying them to provide as RcppMLPACK, then it seems that the strategy I proposed above could be useful. Maybe not---you would know more than me.
As for the other issues, please go ahead and open some issues on the mlpack Github page; we can discuss them each separately there, and I think it will be a better way to fix the issues (since it'll possibly be more than just my opinion being posted).
To clarify:
Ok, let me know how your tinkering goes. I like the concept of being awesome, and I also like the concept of upstream being responsive to the needs of downstream (Conrad's help in that manner has been invaluable to us also). So when you are ready, let's start discussing how things in mlpack can change to make it easier for RcppMLPACK. :)
@rcurtin I have something up and running now over here. It's really just a fork as a distinct master to experiment. But it takes an external -lmlpack
. (And I currently use 2.0.2 as there was a Debian package for it which I rebuilt for Ubuntu use in Travis (14.04) and on my 16.04 machines. If you hint that I should really upgrade to 2.0.3 I can....).
Now, when we run R CMD check ...
over the tarball created by R CMD build ...
from a source repo -- and you can just browse this here thanks to Travis I get
File ‘RcppMLPACK/libs/RcppMLPACK.so’:
Found ‘_ZSt4cout’, possibly from ‘std::cout’ (C++)
Object: ‘kmeans.o’
Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor the system RNG.
See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.
That seems to come from the sole std::cout
use in core/util/prefixedoutstream_impl.hpp
. Any chance we can that one use a fallback to something else?
For example, we can use the trick we settled on with Conrad and provide an overrideable fallback:
#if !defined(ARMA_DEFAULT_OSTREAM)
#define ARMA_DEFAULT_OSTREAM std::cout
#endif
ARMA_DEFAULT_OSTREAM << std::endl;
Now, when I include RcppArmadillo.h
as well (to get this from RcppArmadolloConfig.h, really) it all passes.
Is that something we could entertain? If I'd be happy to pivot over to an issue at your repo. Also, maybe follow the the RcppMLPACK for over here if you could.
(Edited to add missing !
in condition above)
Some more good new here.
In the "forked" repo I can now merrily build against an external MLPACK 2.1.0 (plus ideally another one one-line patch upstream to silence R CMD check
completely) which is way quicker to build and possibly more portable. (Possibly also toxic as we get external Armadillo and our Armadillo but let's cross that bridge when we get there.) I did all that on Linux (and have Ubuntu packages, see the .travis file. Could you and/or maybe @coatless try on OS X via brew (or however you install MLPACK) ?
If it all works we should maybe merge that back here?
@eddelbuettel via brew. I'll run it shortly.
Passes the initial checks, however, it bombs three compiles in on the Logistic Regression file. With:
clang++ -std=c++11 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/RcppArmadillo/include" -I. -I../inst/include -fPIC -Wall -mtune=core2 -g -O2 -c logisticRegression.cpp -o logisticRegression.o logisticRegression.cpp:29:11: error: no member named 'Classify' in 'mlpack::regression::LogisticRegression<arma::Mat
>' lrc.Classify(test2, resultsur); 1 error generated. make: *** [logisticRegression.o] Error 1 ERROR: compilation failed for package ‘RcppMLPACK’ * removing ‘/Library/Frameworks/R.framework/Versions/3.3/Resources/library/RcppMLPACK’ Error: Command failed (1)
Check response:
- installing source package ‘RcppMLPACK’ ... checking for g++... g++ checking whether the C++ compiler works... yes checking for C++ compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking how to run the C++ preprocessor... g++ -E checking for gcc... gcc checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether we are using the GNU C++ compiler... (cached) yes checking whether g++ accepts -g... (cached) yes checking whether g++ supports C++11 features by default... no checking whether g++ supports C++11 features with -std=c++11... yes checking for pkg-config... no checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking mlpack/core.hpp usability... yes checking mlpack/core.hpp presence... yes checking for mlpack/core.hpp... yes checking for mlpack/core.hpp... (cached) yes checking for main in -lmlpack... yes configure: creating ./config.status config.status: creating src/Makevars
RcppMLPACK 2.1.0-1
pkgconfig:
compiler: g++ -std=c++11 cflags:
libs: -lmlpack
I'll look more at it tomorrow.
Ahh, MLPACK via brew is on 2.0.1...
Yes, you want MLPACK 2.1.0 (plus the patch) for the nifty pkg-config
integration.
Nice to see the brew PR merged!
Minor update from my end: managed to build mlpack in the PPA via a proper 'limit memory use' option for gcc. So back to using the PPA also in the .travis.yml.
@thirdwing Any interest in what we have done over here ?
This now builds reliably (and fast, using pkgconfig
) on Debian and Ubuntu (via the packages in my PPA), or reliably via configure
if pkgconfig
is not found (or MLPACK is older than 2.1.0).
@coatless got the homebrew package updated as well. So we would have the two major OSs covered. We could try to do what Jeroen has done and create a library for Windows; but this is of lesser importance to me.
What do you think?
Really sorry for this. I was occupied by some personal issues recently.
I can add you into this repo if necessary.
Absolutely no worry. I know you're busy with school and research and whatnot. And hope you're well.
Just have a look what I have in the tentatively-named eddelbuettel/rcppmlpack2. And then let's talk about what that may get right, or not, and what the best way forward. There are advantages to just using an external libmlpack ... And drawbacks, of course, for the poor souls who do not have that on their systems easily.
So ... what is our plan?
First, let me add you into this repo.
Second, give me one more day. I will try to embed the boost::serialization
. If it doesn't work, let's just use the external library.
No rush. We have the external (current) library as a backup.
Getting Boost Serialization would be nice; not sure how easy it is. There are other Boost libraries which require linking (eg Boost Regex).
It should be just like how we use boost in mzR
https://github.com/sneumann/mzR/tree/master/src/boost
Err, that looks like a complete copy of Boost, no?
Almost a complete copy...
But we don't find any better solution for Windows users.
That approach seems ... suboptimal to me. We have many options. Maybe one of them is two RcppMLPACK packages on CRAN: one against the external library, one with an embedded library.
Or we could waste hours doing a Jeroen-alike approach of downloading-when-on-Windows -- as users get binary packages they don't pay the cost.
For OS X and Linux, we have something much better now. Works for me.
@eddelbuettel @coatless Let's use the external lib and submit a new version to CRAN.
So now that RcppMLPACK1 is finalized and on CRAN, how do we proceed?
"Freeze" this branch, but then merge in from my eddelbuettel/rcpppmlpack2 ?
Something else?
How do we preserve the current CRAN binaries?
My suggestion is to hold the Windows and OSX binary in a drat repo, so people can find it if they need.
FYI: grabbed the binaries off of CRAN and also stuck them with the release tag
https://github.com/thirdwing/RcppMLPACK1/releases/tag/1.0.10-6
We could just create an RcppMLPACK org wit v1, v2 and the drat in it.
Few updates on my end:
mlpack
formula for brew.
quantlib
as one only needs to substitute homebrew/science
with homebrew/core
. pkg-config
support.
configure.ac
of RcppMLPACK2 script not respecting ~/.R/Makevars
choice of CXX1X
, which is defaulted to over CXX
due to the CXX11
flag, unlike the configure.ac
of RcppArmadillo that respects CXX
. pkg-config
and trying to build RcppMLPACK2 I'm now receiving:ld: library not found for -l/usr/local/opt/armadillo/lib/libarmadillo.dylib
collect2: error: ld returned 1 exit status
make: *** [RcppMLPACK.so] Error 1
ERROR: compilation failed for package ‘RcppMLPACK’
Though, what's a bit odd is the file exists in the given location:
wirelessprv-10-192-190-25:rcppmlpack2 agentxyz$ ls /usr/local/opt/armadillo/lib
libarmadillo.7.60.1.dylib libarmadillo.7.dylib libarmadillo.dylib pkgconfig
I've forced rebuilds across the board, but to no avail with pkg-config
included.
It seems if I revert the formula build to skip pkg-config
, everything builds fine and the package loads.
I'm not entirely sure why. One note that came up after bottling was:
Warning: homebrew/science/mlpack dependency gcc was built with a different C++ standard
library (libstdc++ from clang). This may cause problems at runtime.
This warning though is normally triggered after any code is compiled with fortran dependencies...
I'll think about it more later.
pkg-config
disabled:RcppMLPACK 2.1.0-1
==================
pkgconfig:
compiler: g++ -std=c++11
cflags:
libs: -lmlpack
pkg-config
enabled:RcppMLPACK 2.1.0-1
==================
pkgconfig: yes
compiler: g++ -std=c++11
cflags:
libs: -L/usr/local/lib -L/usr/local/opt/armadillo/lib/ -L/usr/local/lib/ -L/usr/local/Cellar/mlpack/2.1.1/lib/ -lm -l/usr/local/opt/armadillo/lib/libarmadillo.dylib -l/usr/local/lib/libboost_program_options-mt.dylib -l/usr/local/lib/libboost_unit_test_framework-mt.dylib -l/usr/local/lib/libboost_serialization-mt.dylib -lmlpack
When I was toying around with changing out compilers (e.g. CXX1X = g++-6
), I still couldn't make the setup link step go through with pkg-config
enabled. However, the package then proceeded to fail with the pkg-config
disabled since an unknown symbol randomly appears!
unable to load shared object '/Library/Frameworks/R.framework/Versions/3.3/Resources/library/RcppMLPACK/libs/RcppMLPACK.so':
dlopen(/Library/Frameworks/R.framework/Versions/3.3/Resources/library/RcppMLPACK/libs/RcppMLPACK.so, 6): Symbol not found: __ZN6mlpack3Log6AssertEbRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
It's been working fine on Linux "since whenever" or else I wouldn't have committed it.
edd@max:~$ cd git/rcppmlpack2/
edd@max:~/git/rcppmlpack2(master)$ ./cleanup
edd@max:~/git/rcppmlpack2(master)$ R CMD INSTALL .
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘RcppMLPACK’ ...
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking how to run the C++ preprocessor... g++ -E
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether we are using the GNU C++ compiler... (cached) yes
checking whether g++ accepts -g... (cached) yes
checking whether g++ supports C++11 features by default... no
checking whether g++ supports C++11 features with -std=c++11... yes
checking for pkg-config... yes
configure: creating ./config.status
config.status: creating src/Makevars
RcppMLPACK 2.1.0-1
==================
pkgconfig: yes
compiler: g++ -std=c++11
cflags: -I/usr/include/
libs: -lrt -larmadillo -lboost_program_options -lboost_unit_test_framework -lboost_serialization -lmlpack
** libs
ccache g++ -std=c++11 -I/usr/share/R/include -DNDEBUG -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/site-library/RcppArmadillo/include" -I. -I../inst/include -I/usr/include/ -fpic -O3 -Wall -pipe -Wno-unused -pedantic -c RcppExports.cpp -o RcppExports.o
ccache g++ -std=c++11 -I/usr/share/R/include -DNDEBUG -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/site-library/RcppArmadillo/include" -I. -I../inst/include -I/usr/include/ -fpic -O3 -Wall -pipe -Wno-unused -pedantic -c coverTreeNeighbor.cpp -o coverTreeNeighbor.o
coverTreeNeighbor.cpp: In function ‘Rcpp::List coverTreeNeighbor(const mat&, int)’:
coverTreeNeighbor.cpp:24:69: warning: the address of ‘referenceTree’ will always evaluate as ‘true’ [-Waddress]
StandardCoverTree> coverTreeSearch(&referenceTree);
^
coverTreeNeighbor.cpp:24:69: warning: ‘mlpack::neighbor::NeighborSearch<SortPolicy, MetricType, MatType, TreeType, DualTreeTraversalType, SingleTreeTraversalType>::NeighborSearch(bool, bool, double, MetricType) [with SortPolicy = mlpack::neighbor::FurthestNeighborSort; MetricType = mlpack::metric::LMetric<2, true>; MatType = arma::Mat<double>; TreeType = mlpack::tree::StandardCoverTree; DualTreeTraversalType = mlpack::tree::CoverTree<mlpack::metric::LMetric<2, true>, mlpack::neighbor::NeighborSearchStat<mlpack::neighbor::FurthestNeighborSort>, arma::Mat<double>, mlpack::tree::FirstPointIsRoot>::DualTreeTraverser; SingleTreeTraversalType = mlpack::tree::CoverTree<mlpack::metric::LMetric<2, true>, mlpack::neighbor::NeighborSearchStat<mlpack::neighbor::FurthestNeighborSort>, arma::Mat<double>, mlpack::tree::FirstPointIsRoot>::SingleTreeTraverser]’ is deprecated [-Wdeprecated-declarations]
In file included from /usr/include/mlpack/methods/neighbor_search/neighbor_search.hpp:602:0,
from coverTreeNeighbor.cpp:5:
/usr/include/mlpack/methods/neighbor_search/neighbor_search_impl.hpp:361:1: note: declared here
NeighborSearch<SortPolicy, MetricType, MatType, TreeType, DualTreeTraversalType,
^
ccache g++ -std=c++11 -I/usr/share/R/include -DNDEBUG -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/site-library/RcppArmadillo/include" -I. -I../inst/include -I/usr/include/ -fpic -O3 -Wall -pipe -Wno-unused -pedantic -c kmeans.cpp -o kmeans.o
ccache g++ -std=c++11 -I/usr/share/R/include -DNDEBUG -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/site-library/RcppArmadillo/include" -I. -I../inst/include -I/usr/include/ -fpic -O3 -Wall -pipe -Wno-unused -pedantic -c logisticRegression.cpp -o logisticRegression.o
ccache g++ -std=c++11 -I/usr/share/R/include -DNDEBUG -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/site-library/RcppArmadillo/include" -I. -I../inst/include -I/usr/include/ -fpic -O3 -Wall -pipe -Wno-unused -pedantic -c naiveBayesClassifier.cpp -o naiveBayesClassifier.o
ccache g++ -std=c++11 -Wl,-S -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o RcppMLPACK.so RcppExports.o coverTreeNeighbor.o kmeans.o logisticRegression.o naiveBayesClassifier.o -lrt -larmadillo -lboost_program_options -lboost_unit_test_framework -lboost_serialization -lmlpack -llapack -lblas -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/RcppMLPACK/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RcppMLPACK)
edd@max:~/git/rcppmlpack2(master)$
But we may need to clean up some OS X issues. Maybe brew and R disagree on compilers...
So what are our next steps? "Private" ie non-CRAN release of RcppMLPACK2 to iron out build issues?
Paging @coatless and @thirdwing : I would like to pick this thread up.
I think we are currently here:
For Windows, Jeroen just calls our from src/Makevars
to a helper script which brings in an external library. I do the same in some other packages (RblpApi, x13binary) via configure
.
It all amounts to the same trick: prescribe your own directly layout and library, then use it.
Could either one of you volunteer to build a Windows library? Or shall we go ahead and do a first release -- maybe "off CRAN" -- for OS X and Linux only?
@eddelbuettel:
I can look into the Windows bit. Should we request a repo within the rwinlib
to store the Windows binary?
macOS wise, I've managed to acquire a different box that I'm going to experiment with a bit more for the pkg-config
support. I think I know why it blew up. Worst case, that feature can be regressed (as it presently breaks the formula build for the linuxbrew port of homebrew).
Time wise, I'll be able to tackle this on the weekend as UIUC SP17 just started today 🎉 ...
In turn:
@coatless: Can you try to explain to me what the issue on OS X? I am a little confused.
I did make our use of pkg-config
better; this is now more robust and takes include directories. But I have a hard time discerning from the above what the actual problem is. Can we not have one consistent compiler choice per brew
for all pieces?
@eddelbuettel:
Turning on the pkg-config
support changes the link flags to include l/usr/local/opt/armadillo/lib/libarmadillo.dylib
, which crashes the build. The working theory is related to how the Fortran code is being compiled that appears to be clashing.
The cases for Fortran code being problematic is:
libc++
from clang
whereas brew is using libstdc++
with clang
dependency gcc was built with a different C++ standard library (libstdc++ from clang). This may cause problems at runtime.
However, the package built with pkg-config
off and was stable to interact with embedded functions.
Oh, I see now. Makes sense, and hard nut to crack.
Can we change the brew recipe to build differently?
Or can we supply our own MLPACK for OS, using our Fortran, not unlike what we do on Windows anyway?
I think I may have a solution to the brew
build issue.
Instead of allowing brew
to set the default to libstdc++
, I can set an environment flag on build to use libc++
.
e.g.
class mlpack < Formula
option "with-libcxx", "Compile using libc++ instead of libstdc++"
def install
ENV.libcxx if build.with? "libcxx"
end
This requires users to individual install armadillo
and mlpack
:
brew install armadillo --with-libcxx
brew install mlpack --with-libcxx
I like it -- feasible, even with workaround, beats infeasible!
Still can't get the linker to the right spot under that configuration after 3 attempts on a 3-hour bus ride back.
Something seems off with the environment presets as the cmake information is reporting --stdlib=libc++
being used. I'm going to toss up a ticket on homebrew
's org to hopefully get info on how to resolve.
Otherwise, I think just maintaining a separate binary for macOS akin to Windows will have to work...
All
serialization
part should be removed.https://github.com/mlpack/mlpack/blob/master/src/mlpack/core/data/serialization_shim.hpp#L122