stan-dev / cmdstan

CmdStan, the command line interface to Stan
https://mc-stan.org/users/interfaces/cmdstan
BSD 3-Clause "New" or "Revised" License
210 stars 93 forks source link

Adjust Makefiles to support cross-compilation #1004

Open ajbouh opened 3 years ago

ajbouh commented 3 years ago

Summary:

Allow setting a new variable, TARGET_OS, which is used instead of OS when it determines a property of the compiled files.

Description:

The existing codebase assumes that the current OS is the same as the OS that will be running the compiled program. This is not always the case. In my case I am running macOS but running my Stan programs on Linux.

I'm including the patch I've come up with here, in case it's helpful to others.

diff --git a/stan/lib/stan_math/make/compiler_flags b/stan/lib/stan_math/make/compiler_flags
index 8106f58b..4cf67576 100644
--- a/stan/lib/stan_math/make/compiler_flags
+++ b/stan/lib/stan_math/make/compiler_flags
@@ -15,16 +15,19 @@ ifneq ($(OS),Windows_NT)
   OS := $(shell uname -s)
 endif

+## Allow cross compiling to another target operating system
+TARGET_OS ?= $(OS)
+
 ## Set OS specific library filename extensions
-ifeq ($(OS),Windows_NT)
+ifeq ($(TARGET_OS),Windows_NT)
   LIBRARY_SUFFIX ?= .dll
 endif

-ifeq ($(OS),Darwin)
+ifeq ($(TARGET_OS),Darwin)
   LIBRARY_SUFFIX ?= .dylib
 endif

-ifeq ($(OS),Linux)
+ifeq ($(TARGET_OS),Linux)
   LIBRARY_SUFFIX ?= .so
 endif

@@ -145,7 +148,7 @@ CXXFLAGS_SUNDIALS ?= -pipe $(CXXFLAGS_OPTIM_SUNDIALS) $(CPPFLAGS_FLTO_SUNDIALS)
 ################################################################################
 # Update compiler flags with operating system specific modifications
 ##
-ifeq ($(OS),Windows_NT)
+ifeq ($(TARGET_OS),Windows_NT)
   CXXFLAGS_WARNINGS ?= -Wall -Wno-unused-function -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-variable -Wno-sign-compare -Wno-unused-local-typedefs -Wno-int-in-bool-context -Wno-attributes
   CPPFLAGS_GTEST ?= -DGTEST_HAS_PTHREAD=0
   CPPFLAGS_OS ?= -D_USE_MATH_DEFINES
@@ -166,13 +169,13 @@ ifeq ($(OS),Windows_NT)
   EXE := .exe
 endif

-ifeq ($(OS),Darwin)
+ifeq ($(TARGET_OS),Darwin)
   ifeq (clang,$(CXX_TYPE))
     CXXFLAGS_OS ?= -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare
   endif
 endif

-ifeq ($(OS),Linux)
+ifeq ($(TARGET_OS),Linux)
   CPPFLAGS_GTEST ?= -DGTEST_HAS_PTHREAD=0
   CXXFLAGS_WARNINGS ?= -Wno-sign-compare
   ifeq (gcc,$(CXX_TYPE))
@@ -252,13 +255,13 @@ LDLIBS_TBB ?=

 else

-ifeq ($(OS),Windows_NT)
+ifeq ($(TARGET_OS),Windows_NT)
   TBB_TARGETS ?= $(addprefix $(TBB_BIN)/,$(addsuffix $(LIBRARY_SUFFIX),$(TBB_LIBRARIES)))
 endif
-ifeq ($(OS),Darwin)
+ifeq ($(TARGET_OS),Darwin)
   TBB_TARGETS ?= $(addprefix $(TBB_BIN)/lib,$(addsuffix $(LIBRARY_SUFFIX), $(TBB_LIBRARIES)))
 endif
-ifeq ($(OS),Linux)
+ifeq ($(TARGET_OS),Linux)
   # Update the suffix with the internal TBB source code!
   # The new version of TBB library is 12+ (e.g., libtbb.so.12.1 not libtbb.so.2)
   TBB_TARGETS ?= $(addprefix $(TBB_BIN)/lib,$(addsuffix $(LIBRARY_SUFFIX).2,$(TBB_LIBRARIES)))
diff --git a/stan/lib/stan_math/make/libraries b/stan/lib/stan_math/make/libraries
index fcababa0..325ab05c 100644
--- a/stan/lib/stan_math/make/libraries
+++ b/stan/lib/stan_math/make/libraries
@@ -119,6 +119,8 @@ ifeq ($(CXX_TYPE),other)
 endif
 TBB_CXX_TYPE ?= $(CXX_TYPE)

+TBB_OS ?= $(shell echo $(TARGET_OS) | tr '[:upper:]' '[:lower:]')
+
 # Set c compiler used for the TBB
 ifeq (clang,$(CXX_TYPE))
   TBB_CC ?= $(subst clang++,clang,$(CXX))
@@ -166,11 +168,11 @@ endif
 $(TBB_BIN)/tbb.def: $(TBB_BIN)/tbb-make-check $(TBB_BIN)/tbbmalloc.def
    @mkdir -p $(TBB_BIN)
    touch $(TBB_BIN)/version_$(notdir $(TBB))
-   tbb_root="$(TBB_RELATIVE_PATH)" CXX="$(CXX)" CC="$(TBB_CC)" LDFLAGS='$(LDFLAGS_TBB)' '$(MAKE)' -C "$(TBB_BIN)" -r -f "$(TBB_ABSOLUTE_PATH)/build/Makefile.tbb" compiler=$(TBB_CXX_TYPE) cfg=release stdver=c++1y  CXXFLAGS="$(TBB_CXXFLAGS)"
+   tbb_root="$(TBB_RELATIVE_PATH)" CXX="$(CXX)" CC="$(TBB_CC)" LDFLAGS='$(LDFLAGS_TBB)' '$(MAKE)' -C "$(TBB_BIN)" -r -f "$(TBB_ABSOLUTE_PATH)/build/Makefile.tbb" compiler=$(TBB_CXX_TYPE) tbb_os=$(TBB_OS) cfg=release stdver=c++1y  CXXFLAGS="$(TBB_CXXFLAGS)"

 $(TBB_BIN)/tbbmalloc.def: $(TBB_BIN)/tbb-make-check
    @mkdir -p $(TBB_BIN)
-   tbb_root="$(TBB_RELATIVE_PATH)" CXX="$(CXX)" CC="$(TBB_CC)" LDFLAGS='$(LDFLAGS_TBB)' '$(MAKE)' -C "$(TBB_BIN)" -r -f "$(TBB_ABSOLUTE_PATH)/build/Makefile.tbbmalloc" compiler=$(TBB_CXX_TYPE) cfg=release stdver=c++1y malloc CXXFLAGS="$(TBB_CXXFLAGS)"
+   tbb_root="$(TBB_RELATIVE_PATH)" CXX="$(CXX)" CC="$(TBB_CC)" LDFLAGS='$(LDFLAGS_TBB)' '$(MAKE)' -C "$(TBB_BIN)" -r -f "$(TBB_ABSOLUTE_PATH)/build/Makefile.tbbmalloc" compiler=$(TBB_CXX_TYPE) tbb_os=$(TBB_OS) cfg=release stdver=c++1y malloc CXXFLAGS="$(TBB_CXXFLAGS)"

 $(TBB_BIN)/libtbb.dylib: $(TBB_BIN)/tbb.def
 $(TBB_BIN)/libtbbmalloc.dylib: $(TBB_BIN)/tbbmalloc.def

Note that this patch assumes that you have also set a number of other environment variables to reasonable values. In my case that has meant setting:

Current Version:

v2.26.1

mitzimorris commented 3 years ago

I think this looks like a reasonable feature, but would like to know more.

could you lay out the use case and alternatives more clearly? when is this necessary? what other reasons for doing this?

ajbouh commented 3 years ago

Yes, this requires having a linux compiler available on your Mac and setting those environment variables for CXX_TYPE, etc. accordingly. I've had a reasonably good experience using zig cc for this.

Also yes, the expectation is that TARGET_OS would default to the same as OS.

The ideal scenario would be to statically link libtbb (supposedly possible with big_iron.inc) into the built program, so you don't have to move around more than one artifact and worry about their relative paths to each other, but that would be a larger change.

As an aside, before arriving at this patch I experimented with using Bazel to build cmdstan and all its dependencies (boost, stan_math, etc.), but I had a lot of difficulty navigating the impact of more granular compilation on c++ static initializer ordering. The compiled program would seem to run but then segfault because either tbb or its stan counterparts weren't initialized properly.

mitzimorris commented 3 years ago

@rok-cesnovar @wds15 and @syclik - thoughts here?

The ideal scenario would be to statically link libtbb (supposedly possible with big_iron.inc) into the built program

is this feasible, given the configurability we have w/r/t running a Stan executable? e.g., env vars STAN_MPI and STAN_OPENCL?

syclik commented 3 years ago

My initial reactions: cool and yikes.

Early on we had cross compiling available. It's really hard to maintain using make. It's not that it's not feasible to do... in the future, someone may forget to check to make sure the cross compiling ability is available.

@ajbouh: 2 questions come to mind

  1. Are those the only places where it needs to change?
  2. Would it be possible to expose more variables that would allow your use case to work without changing the makefile as much?

In principle, there's nothing wrong with what you're proposing, but given that this is the only ask for cross compiling (in my memory) ~7 years, I'd lean on not trying to add more indirection into the makefiles to make it complicated.

Based on the diff, it looks like if there was a way to set this properly, we'd be ok? tbb_os=$(TBB_OS) Is that right?

wds15 commented 3 years ago

Statically linking the TBB is strongly discouraged by the TBB folks. So I would not ever do that. Technically it is possible, but not recommended at all from Intel.

Wrt. To cross-compilation: Sounds cool, but I have no clue on it as I have never done that.

rayegun commented 3 years ago

This is a timely issue! I am currently attempting to build CmdStan as an artifact for use in the Julia ecosystem here. Stan has existing wrappers in Julia but providing CmdStan as an artifact in Julia's package manager is quite desirable (so users can just ]add CmdStan and things will work out of the box).

BinaryBuilder, which is used to build artifacts for Julia, is a cross compilation environment based on Alpine Linux. The goal is to use a single script to build artifacts for every supported Julia platform and provide those binaries/artifacts as painlessly as possible to Julia users.

I obviously have the option to use the prebuilts and provide them as artifacts. But that's less desirable than cross compilation support if possible! Just thought I'd add our use case as support for cross compilation.

mitzimorris commented 3 years ago

@Wimmerer - just to let you know that Stan.jl dev Brian Parbhu and I are working on the CmdStan install process, but focussing on the pre-builts. happy to discuss your use case.

ajbouh commented 3 years ago

Based on my tests these are the only places that need to change, though you do need to set a lot of environment variables.

The primary problem is that OS is used in two ways right now:

If you wanted, you could sort of invert the patch and let me specify my own path to stanc, and expose tbb_os. I don't remember if there would need to be more.

FWIW, I think zig cc makes cross compilation much more achievable than it has historically been. No need for a separate system root or anything like that.

rayegun commented 3 years ago

@mitzimorris the end goal is simply to support Julia's Stan wrappers without forcing users to install Stan separately. Rather than requiring users to specify a path to a CmdStan or stanc install users would simply ]add Stan, which would install Stan.jl. Stan.jl would depend on an artifact package called CmdStan_jll. This package essentially just unpacks platform specific tarballs built by our build scripts.

We generate that package using a build script in the JuliaPackaging/Yggdrasil repository. This functions much like a normal package manager, except there's a serious commitment to being cross platform. This means we try and have build scripts that support OS's Windows, Mac, Linux, FreeBSD; architectures i686, x86_64, aarch__, and a few others like PowerPC; as well as musl libc. The idea being, if you want to use a Julia package which depends on binaries (and additional source directories and executables) installation should just work.

In an ideal world I would have the following binary packages:

  1. OneTBB_jll
  2. Sundials_jll (we already have this)
  3. Stanmath_jll (which depends on OneTBB_jll and Sundials_jll)
  4. Stanc_jll (which depends on Stanmath_jll)
  5. CmdStan_jll (depends on Stanc_jll)

I'm missing dependencies, but not all of them have to be jll's, Stanmath_jll could use it's own internal versions of TBB and sundials for instance. By building these ourselves we gain compatibility with all (fingers crossed) platforms Julia supports, compatibility with weird string ABIs and architectures, and dead simple installation.

Stan.jl can also allow users to point to their own installation of course, if they have a customized version, but for most users automatically installing CmdStan_jll is good enough. I'm going to try and apply the patch above, to see if we can get most of the way there by just patching the makefile on our end. TBB seems to be the biggest headache when simply building CmdStan, so I might work towards the dependency graph above, where I first build TBB, then Stanmath, etc.

If we can get cross compilation working for Julia, prebuilt binaries for all the supported platforms should just fall right out, which could also help you all provide binaries for less common platforms.

mitzimorris commented 3 years ago

Rather than requiring users to specify a path to a CmdStan or stanc install

understood this is a total headache, but CmdStan shoulders the build burden so that the wrapper interfaces don't have to. for CmdStanR and CmdStanPy, we have tried very hard to co-ordinate the interfaces and the goal is to allow any interface to run from a single CmdStan installation by adopting the convention of a default CmdStan install location in the user's home directory, which can be overridden as needed. the default is ~/.cmdstan. ($HOME$/.cmdstan) - although the CmdStanR folks have yet to adopt this, but they have agreed in priniciple this is what it should be.

bparbhu commented 3 years ago

Hi @Wimmerer ,

So right now we're redoing the install process for all CmdStan interfaces not just the StanJulia interface. We're creating a process where the CmdStan Interfaces don't deal with installation. The main point of doing this is to make the installation process for any CmdStan interface much less painless. These new processes will also impact how we deliver other packages like CmdStanARM packages to the user. In such being able to provide added Stan experiences to other programming languages like Python and Julia. If you would want to be part of that let me know and we can collaborate further. Let me know what you think.

Thanks,

-Brian

mitzimorris commented 3 years ago

prebuilt binaries for all the supported platforms

I agree this is something that we need!!!! but this doesn't require cross-compilation - it's just that we only build on the platforms we support.

the Makefiles are very difficult to maintain - we've refactored them a bunch of times and there have been several attempts to use cmake. we've made a bunch of choices that have improved the compilation and runtime performance but have made the install/build process increasingly complex. therefore, my inclination is to push back against adding complexity to the makefiles so that they can do more when there are workable alternatives.

giordano commented 3 years ago

but this doesn't require cross-compilation

Sure, but with BinaryBuilder we have a single cross-compilation environment which can target 16 different platforms, including the recent aarch64-apple-darwin (aka Apple Silicon), instead of having 16 different build machines, some of them with non-common hardware (e.g. PowerPC)

bparbhu commented 3 years ago

I love the functionality that Binary Builder presents in terms of a single cross-compliation platform. In fact when talking to Stefan Karpinsky about it a few years he ago, he recommended it to me in regards to the StanJulia interface installation. Though, there are other things to think about in terms of what this would add to other CmdStan interfaces in terms of dependencies and other CmdStan adjacent packages. I don't know if they would feel comfortable having anything Julia related in their package ecosystem, not that it would be anything bad but I don't know how this play out with regards to environments people develop in and package management on their end. I realize that there are Julia packages that interface with other languages like RCall and PyCall but I'm not sure that I would want someone to deal with 3 languages in a single CmdStan package.

giordano commented 3 years ago

To be clear, I'm not going to advocate to get cross-compilation working: I just wanted to point out what are its advantages (single compilation environment for multiple platforms). It'd be great if it was possible to cross-compile cmdstan, but I understand your objectives.