Closed balos1 closed 5 years ago
After becoming more familiar with spack, I want to provide an update on this issue.
At the 8/23 XSDK meeting it was theorized that using CMake's find_package
could result in one XSDK package using a different installation of a TPL than another XSDK package might use since not all packages use the same CMake version or even CMake at all.
The thought was that there is the common build utility, spack, and then the secondary build systems (i.e. CMake or autoconf) that the packages themselves use. In order to make sure packages with shared dependencies use the same installation of a TPL, spack is in charge of locating TPLs and provides the include directories and libraries to a package via the TPL_<package>_LIBRARIES
and TPL_<package>_INCLUDE_DIRS
options.
It does make sense that spack is charge of finding the TPLs so that shared dependencies amongst packages use the same installation of TPL. However, what I am proposing in this issue still leaves spack in charge of finding the location of the TPLs.
In the case I am proposing, instead of spack having to provide the libraries and include directory, it could simply provide the path to a TPL's root installation directory, or the path to its *.cmake
or *.pc
(pkg-config). This decision is left to a package as this is setup in their spack package.py
.
For example, in SUNDIALS I would do something like this in our spack package.py
:
# Building with RAJA
if '+raja' in spec:
args.extend([
'-DTPL_RAJA_DIR=%s' % spec['raja'].prefix.share.raja.cmake
])
So allowing XSDK packages to use CMake's find_package
with an option TPL_<package>_DIR
should not negatively effect how XSDK is built with spack, but it does allow for packages that use CMake to follow best practices.
CC: @balay
it was theorized that using CMake's find_package could result in one XSDK package using a different installation of a TPL than another XSDK package might use since not all packages use the same CMake version or even CMake at all.
How does the version of CMake used affect this? CMake does offer different features by version, but AFAIK the find_package
logic hasn't changed much over time. Do you have examples?
It's true that Spack does not currently force build dependencies to be consistent across dependencies in the same DAG. So if A depends on B, and B was built first, A could be built with one version of CMake and B with another. If this is actually a compatibility issue for some projects, we could look into imposing tighter restrictions, but I think it would be said if you had A -> B -> C, where B and C are built with CMake 2.6, but A has to rebuild them and can't use the installed versions because it requires CMake 3.1.
In order to make sure packages with shared dependencies use the same installation of a TPL, spack is in charge of locating TPLs and provides the include directories and libraries to a package via the TPL_
LIBRARIES and TPL _INCLUDE_DIRS options.
FWIW, even this is probably overkill within Spack. The Spack build environment is intentionally set up so that packages will not find versions/configurations of dependencies that are outside the DAG. If you have a DAG like this:
A
| \
B C
| /
D
we guarantee that any DAG built by Spack will only have one configuration of a given package. So A, B, and C all have to see the same version (or rather configuration hash) of D. If you install two D's, we force A, B, and C to be rebuilt in spack install A
if B or C needs a newer version of D.
This is enforced a few different ways:
CMAKE_PREFIX_PATH
, LD_LIBRARY_PATH
, LIBRARY_PATH
, CPATH
, etc. that may contain things that would interfere with build systems.CMAKE_PREFIX_PATH
(in topologically sorted order so that each package is guaranteed to come before its children). This means find_package()
will look at the Spack dependencies first, before anything external to the DAG.-I
, -L
, and -Wl,-rpath
args to the compile line for all dependencies. The -I and -L guarantee that we'll see dependency headers first, before others, and the RPATHs guarantee that, at runtime, A will look for libraries in its own, then B, C, and D's directories first, before anything else.So, we try to make it pretty hard for Spack builds to find the wrong TPLs. If you have examples where they do, can you point me to that?
Anyway, even with all that said, I do think it's good to make it easy for users to build your tool, and I like the prefix options much better than having to specify include and lib directories, assuming the dependent package knows how to use the libraries it finds (i.e., what are their names?). This is almost always the case for the direct dependencies, but the place where it gets nasty is with static linking. I believe part of the original rationale in xSDK for the explicit library information was to enable static linking, where you need to be able to get the libnames for all of the transitive dependencies as well as the direct ones. Most packages don't know those, and may not be able to if the dependency they're linking to can be built in many different ways.
We have some rudimentary interfaces you can use for this in Spack, this but we don't yet handle all the cases as well as we could. See here. I do not believe CMake has a lot of information about transitive dependencies embedded in its targets, but maybe that has changed since I last looked.
How does the version of CMake used affect this? CMake does offer different features by version, but AFAIK the find_package logic hasn't changed much over time. Do you have examples?
I do not know of any examples of different versions of CMake's find_package
producing different results, and you are right it has not changed much over time, so it is unlikely this would be the case. Stating that the different CMake versions may result in different find_package
behavior was mostly a poor example on my part.
What I was trying to express was the raised concern that, on the surface, it seems that relying on an xSDK package's build system to find a library could result in dependent package A finding one installation of C, and dependent package B finding another installation of C. If all Spack did was provide a name of a dependency to a dependent, and the dependent determined where to search for the dependency, then it is not hard to imagine how the concern may become valid. However, due to the way Spack is implemented, as you have described, it becomes much more difficult to validate the concern.
So, we try to make it pretty hard for Spack builds to find the wrong TPLs. If you have examples where they do, can you point me to that?
I can think of one possibility (that I have yet to try). Lets say you have the DAG from your explanation:
A
| \
B C
| /
D
Lets also say there two Ds available: /usr/local/D
and /home/myuser/local/D
. Spack determines that /usr/local/D
is the version to use, so Spack adds /usr/local/D
to CMAKE_PREFIX_PATH
(topological sort order withstanding of course).
Then let us say B uses find_package
like this:
find_package(D)
C, who feels like being in total control, uses find_package
like this:
set(D_DIR /home/myuser/local/D)
find_package(D NO_DEFAULT_PATH)
B will find the expected D in /usr/local/D
, but C will ignore CMAKE_PREFIX_PATH
when looking for D, and will find /home/myuser/local/D
.
Again, I have not tried this example yet, but I don't see how any of Spack's enforcements would address this problem. Also, the example is a bit contrived, as C would seemingly have to be trying to break things... I cannot think of any good reason to do things like the example.
I believe part of the original rationale in xSDK for the explicit library information was to enable static linking, where you need to be able to get the libnames for all of the transitive dependencies as well as the direct ones. Most packages don't know those, and may not be able to if the dependency they're linking to can be built in many different ways.
Modern CMake best practices dictate that packages declare all of the dependencies of a target (and export them), even when static linking is used. In addition, the scope of the dependency should be declared (i.e. PUBLIC
, INTERFACE
, or PRIVATE
). As long as every package in the dependency chain does this properly, transitive dependencies will be known by the root dependent package.
Of course, in practice few packages follow the best practice (it is a newer concept), so transitive dependencies are a problem. I don't think that this is enough reason for the xSDK to disallow packages from using find_package
in this config mode. A dependent package just has to take care that, when using find_package
this way, the dependency is declaring and exporting all of their dependencies, and so on down the chain. Or, the dependent package has to be sure that they know all the transitive dependencies in some other way. It may seem like a lot of work and not practical for a package, but it is not hard to do... a package can create a target wrapper for a dependency that doesn't export targets (to be clear I am not suggesting xSDK requires packages to do things this way, just allow it).
Summary
Based on what you have said about how Spack deals with dependencies, and the unlikely example I provided, it would appear that xSDK allowing packages to have a TPL_<package>_DIR
option that can be used with find_package
in config mode would not have negative effects.
C, who feels like being in total control, uses
find_package
like this:set(D_DIR /home/myuser/local/D) find_package(D NO_DEFAULT_PATH)
Well, I hope a package that does this wouldn't pass xSDK review or PR review in the first place. But yes, this would (currently) cause bad things to happen. OTOH, C
likely wouldn't build for anyone but myuser
in this example, so I doubt it would be very popular 😄.
Again, I have not tried this example yet, but I don't see how any of Spack's enforcements would address this problem.
We would still add -I
, -L
, and RPATHs via the compiler wrapper, even in this example. But these arguments won't supersede any added by C
's build system, so likely what will happen is that you'll get -I
and -L
from your local D, but load Spack's D at runtime due to the RPATH. That's not a fun scenario.
We could get around this by sandboxing as described in #5764, though not all OS's support user namespaces well yet. When they do that will be a very good way to enforce more reproducible builds.
The other thing we could do would be to put some checks into Spack to look at the output of things like ldd
on Linux or otool
on OSX (to make sure system/non-spack dependencies haven't been pulled in unless you ask for them). And we could use RedHat's abidiff
tool to check for ABI issues (as you might see with C and D in this example). That stuff isn't done yet.
I don't think that this is enough reason for the xSDK to disallow packages from using
find_package
in this config mode.
I actually agree with this. I think some of the current xSDK rules are cumbersome and not particularly natural for modern build systems, so I'd like to see more updates to the policies like this. I worry that some of the XSDK requirements make builds support two sets of options -- one for xSDK and one that's "natural" for their build system. It seems to me like that leaves a lot of potential for bugs in the build, because I doubt anyone will rigorously test both sets of options. So any convergence between xSDK and widely accepted styles would be a good thing IMHO.
OTOH, C likely wouldn't build for anyone but myuser in this example, so I doubt it would be very popular 😄.
myuser
would have to be trying to be evil, so yeah I do not think this has to be a major concern of Spack or the xSDK :relieved:.
I worry that some of the XSDK requirements make builds support two sets of options -- one for xSDK and one that's "natural" for their build system.
This is exactly where I am coming from on the issue. We are trying to implement some modern CMake best practices, but have found that to remain completely in line with the xSDK installation policies we would have to maintain a pretty significant layer on top of our build system to convert between options. IMO this is counter to the xSDK goal of increasing sustainability.
@balos1, @tgamblin
There actually is no 100% standard for the variables that Find<PackageName>.cmake
responds to and exports (one of the motivations for the TPL system in TriBITS). And, as far as I know, there is no standard way in CMake to tell a CMake project to use an optional external package or not. That is the purpose of TPL_ENABLE_<PackageName>
(or whatever your CMake project wants to call it, it does not mater what it is called if you are using a system like SPACK). Once you pass in TPL_ENABLE_<PackageName>=ON
, then the details on how the package is found can left up to the CMake project (it can use find_packge(<PackageName>)
or whatever, as long as it is documented). All that is important is that the CMake project asserts that the package is found and allows the user to point to the exact libraries, include directories, etc. and does not allow for auto-finds when they are not desired (and auto-finds are never desired in a system like SPACK). In fact, TriBITS FindTPL<PackageName>.cmake
modules can use find_package()
as well (see here).
So, assuming there is no need for uniformity in terms of how a user specifies where to find an external package (for which no 100% standard in CMake exist for find_package()
) then what the xSDK standard must require is:
The CMake project provides a CMake variable to enable or disable the usage of an optional external packages (i.e. TPL). (The variable can be called TPL_ENABLE_<PackageName>
or whatever you want as long it it is documented.)
The CMake project provides a well-documented way to point to the exact pre-installed external packages (i.e. TPL) on the system and therefore eliminate the chance for any auto-find behavior.
Agreed?
Tasmanian is perhaps simpler than most packages, but within Tasmanian, there are two cases:
The dependency is something simple, e.g., BLAS, then I only need to know BLAS_LIBRARIES
provided either by the user or cmake find_package()
. Then the logic goes like this:
option(USE_XSDK_DEFAULTS "Enable xSDK compatibility" OFF)
option(Tasmanian_ENABLE_BLAS "Enable Blas support for Tasmanian" OFF)
....
if (USE_XSDK_DEFAULTS AND (DEFINED TPL_ENABLE_BLAS))
set(Tasmanian_ENABLE_BLAS ${TPL_ENABLE_BLAS}) # compatibility with xSDK nomenclature
endif()
.....
if (Tasmanian_ENABLE_BLAS)
if (NOT DEFINED BLAS_LIBRARIES) # user did not provide BLAS_LIBRARIES, use find_package()
find_package(BLAS)
if (NOT BLAS_FOUND)
message(FATAL_ERROR "could not find BLAS")
.....
target_link_libraries(Tasmanian_libsparsegrids ${BLAS_LIBRARIES})
The second case is more complex, e.g., CUDA, which comes with many-many components. In that case, individual components cannot be specified and the user can give only general guidelines, e.g., define CUDA_TOOLKIT_ROOT_DIR
, then the logic is a bit simpler:
if (USE_XSDK_DEFAULTS AND (DEFINED XSDK_ENABLE_CUDA))
set(Tasmanian_ENABLE_CUDA ${XSDK_ENABLE_CUDA}) # compatibility with xSDK nomenclature
endif()
.....
if (Tasmanian_ENABLE_CUDA)
find_package(CUDA)
if (NOT CUDA_FOUND)
message(FATAL_ERROR "<no CUDA>")
If a library provides a package-config, e.g., their own find_package()
, then that should be considered second case. For example, if someone wants to use Tasmanian through cmake, then I cannot support or debug someone else's complex script or hardcoded idea of all dependencies, I know all dependencies from Tasmanian compile time. The supported way to use Tasmanian is:
find_package(Tasmanian versionX.versionY PATH <path-to-tasmanian-install>)
target_link_libraries(Someone_else's_target Tasmanian_libsparsegrid)
If someone doesn't provide their own package-config, then there are generic cmake ways to make one and search for libraries, which is similar to case 1 above.
Note: all of the variables specified above are controlled by the user at build time:
cmake -D CUDA_TOOLKIT_ROOT_DIR=/path/to/cuda -D Tasmanian_ENABLE_CUDA=ON ....
Note: someone mentioned versions of cmake
, find_package(OpenMP)
works different in versions before and after cmake 3.9; OpenMP is strange as it is really a library and a language extension, cmake did not correctly identify the library which made it a nightmare to export your own dependencies to another project.
@mkstoyanov, your write-up of how Tasmanian implements this policy is consistent with how I believe it should be. You provided a documented cache var Tasmanian_ENABLE_BLAS
that can be set to ON
or OFF
(and your CMake project can decide which is the default).
But what I think @balos1 is arguing is that a package should not have to support <PackageName>_LIBRARIES
so you could just have:
if (Tasmanian_ENABLE_BLAS)
find_package(BLAS)
if (NOT BLAS_FOUND)
message(FATAL_ERROR "could not find BLAS")
I am okay with as long as the find_package(<PackageName>)
call allows the user to 100% point to the TPL that they want and that will turn off all auto-finds for that external package, period. (It is just not clear to me that every Find<PackagesName>.cmake
module does that correctly.) (For example, with an earlier version of "Package A" (not to be named) that use used in CASL , if you passed "PackageA" a BLAS lib that did not exist or it was not happy with, it would auto-find another BLAS. That is great behavior for a grad student who just wants to build your one package "PackageA" but is bad behavior for an ecosystem meta build tool like SPACK and results in very bad behavior at link-time or runtime in many cases. We hit this in our meta-build tool in the CASL project with "PackageA".)
And as you describe for OpenMP and CUDA, some external dependencies re not just libraries with and optional set of include directories so it can't just be a list of include directories and libraries. That is also okay. It just needs to be documented how to point to the exact thing you want and disable auto-finds.
Agreed?
@tgamblin said:
we guarantee that any DAG built by Spack will only have one configuration of a given package. So A, B, and C all have to see the same version (or rather configuration hash) of D. If you install two D's, we force A, B, and C to be rebuilt in spack install A if B or C needs a newer version of D.
SPACK (or any other ecosystem meta-build tool) can't guarantee that if one of the downstream packages B
or C
does not allow the user to pass in the exact installed version of A
and disable auto-finds of A
that can do crazy things on some systems. (I can dig up examples where "Package A" in my above comment auto-found BLAS under /usr/lib/ when I told it to only use BLAS at a different location.. For a reason that I don't remember, it looked at the BLAS I pointed it to and decided that it did not like that BLAS and then auto-found a BLAS under /usr/lib/. That was a hard defect to debug I will tell you. ) Packages built by SPACK or any other ecosystem meta-build tool must turn off auto-find for almost everything. Can we state that in the policy?
@bartlettroscoe
But what I think @balos1 is arguing is that a package should not have to support
<PackageName>_LIBRARIES
so you could just have:if (Tasmanian_ENABLE_BLAS) find_package(BLAS) if (NOT BLAS_FOUND) message(FATAL_ERROR "could not find BLAS")
I am okay with as long as the
find_package(<PackageName>)
call allows the user to 100% point to the TPL that they want and that will turn off all auto-finds for that external package, period. (It is just not clear to me that everyFind<PackagesName>.cmake
module does that correctly.)
That is what I am arguing.
So, assuming there is no need for uniformity in terms of how a user specifies where to find an external package (for which no 100% standard in CMake exist for
find_package()
) then what the xSDK standard must require is:The CMake project provides a CMake variable to enable or disable the usage of an optional external packages (i.e. TPL). (The variable can be called
TPL_ENABLE_<PackageName>
or whatever you want as long it it is documented.)The CMake project provides a well-documented way to point to the exact pre-installed external packages (i.e. TPL) on the system and therefore eliminate the chance for any auto-find behavior.
Agreed?
I agree with this. How an xSDK package eliminates any chance of auto-find behavior can be up to the package. I will throw some possibilities out there:
NO_DEFAULT_PATH
option and the variable <package>_DIR
.<package>_DIR
is the only place CMake will look with find_package
if NO_DEFAULT_PATH
is used. Here is a rough example of what an xSDK package's CMake code might look like:
option(TPL_ENABLE_XYZ "Enable XYZ" OFF)
option(XYZ_DIR "XYZ CMake config directory" "")
if(TPL_ENABLE_XYZ)
find_package(XYZ REQUIRED NO_DEFAULT_PATH)
endif()
CMAKE_PREFIX_PATH
and then sets the CMAKE_PREFIX_PATH
with a package's dependencies.This should result in the correct TPL being found without specifying NO_DEFAULT_PATH
in the find_package
call and without having the TPL_<package>_DIR
option, unless a TPL does something extreme in their CMake code, e.g. hardcode <package>_DIR
and specifying NO_DEFAULT_PATH
, but not much, if anything, can fix a library choosing to do that.
@balos1 said:
I agree with this. How an xSDK package eliminates any chance of auto-find behavior can be up to the package. I will throw some possibilities out there:
Sounds good to me. Can we create a PR that updates the policies for this?
@balos1 and @bartlettroscoe, I think we are in agreement. "A package should have a reliable way of specifying external dependencies and a package should either build exactly with the specified option or not at all. If no specific dependence is given, then and only then can the package use automated search." Or something along those lines.
In addition, I maintain that if the build system provides a native way of searching and specifying dependencies, e.g., cmake find_package()
, then the xSDK package should respect that and modify the default behavior only to correct a deficiency.
For example (in my experience), find_package(CUDA)
is very reliable. I regularly work on a system with 2 - 4 versions of CUDA (installed simultaneously) and cmake has done a great job of picking the one given by CUDA_TOOLKIT_ROOT_DIR
. On the other hand, BLAS, OpenMP and MPI have been riddled with problems for me (especially OpenMP). BLAS has a way to specify version by supported vendors, but once two of us spend considerable time once trying to point it to the correct ATLAS. Manual BLAS_LIBRARIES
is a much more reliable solution (which also does not prevent the use of a vendor).
On the other hand, we have spack that relies a lot on CMAKE_PREFIX_PATH
and automated searches. Overall spack works well, but I have had instances where system libraries would be picked over the spack provided ones. I am not sure what can be done in that case.
@balos1 and @bartlettroscoe, I think we are in agreement. "A package should have a reliable way of specifying external dependencies and a package should either build exactly with the specified option or not at all. If no specific dependence is given, then and only then can the package use automated search." Or something along those lines.
Excellent. Can we create a PR to update to that policy?
Manual
BLAS_LIBRARIES
is a much more reliable solution (which also does not prevent the use of a vendor).
Yes, we build with the Intel MLK and all kinds of libraries very robustly by specifying <Package>_INCLUDE_DIRS
and <Package>_LIBRARIES
. That leaves zero chance of find problems/surprises.
On the other hand, we have spack that relies a lot on
CMAKE_PREFIX_PATH
and automated searches. Overall spack works well, but I have had instances where system libraries would be picked over the spack provided ones. I am not sure what can be done in that case.
Okay, that is not good. I have had trouble with that as well on some systems.
@tgamblin, can SPACK be updated to pass in exact info to avoid relying on CMAKE_PREFIX_PATH
? Otherwise, we are asking for portability problems with software installed with SPACK.
@bartlettroscoe said:
can SPACK be updated to pass in exact info to avoid relying on CMAKE_PREFIX_PATH? Otherwise, we are asking for portability problems with software installed with SPACK.
I don't think there is any need to update Spack. If you don't want to rely on CMAKE_PREFIX_PATH in a spack package.py you can do something like this:
# Building with RAJA
if '+raja' in spec:
args.extend([
'-DRAJA_DIR=%s' % spec['raja'].prefix.share.raja.cmake
])
and then using the NO_DEFAULT_PATH
option in CMake:
if(TPL_ENABLE_RAJA)
find_package(RAJA REQUIRED NO_DEFAULT_PATH)
if((NOT RAJA_FOUND) AND (NOT USE_XSDK_DEFAULTS))
find_package(RAJA)
endif()
Excellent. Can we create a PR to update to that policy?
I can open up a new issue with the proposed rewording. As far as an actual PR, unless there is a repository I am unaware of that also contains the policies, there is not a place to create one. The community installation policies draft is in this google doc, and the xsdk-policy-compatibility template indirectly refers to the installation policies via M1 - so it does not need to be updated.
M1. Each xSDK-compatible package must support either the community GNU Autoconf or CMake options as defined in the following document:
xSDK Community Installation Policies: GNU Autoconf and CMake Options. The goal of this policy is to enable individual users, computing centers, and package managers (such as Spack) to install the package in a way that is compatible with other xSDK packages on the same system. It is understood that managing all the dependencies for a variety of packages is nontrivial and is not resolved by standard options.
Also, there is a separate policy for dealing with LAPACK and BLAS (number 9 in the community installation policies, number 10 deals with other TPLs). Do we want to leave the LAPACK/BLAS policy in tact? I am fine with doing so.
Packages built by SPACK or any other ecosystem meta-build tool must turn off auto-find for almost everything. Can we state that in the policy?
While I agree with this in principle, not every build system is going to be able to disable auto-find, and Spack has to deal with those systems too. So I wouldn't make it a requirement. For the major build systems, we do things to make it pretty hard to auto-find the wrong things -- e.g., we actually inject arguments on the compile line via wrappers, and we override things like CMAKE_PREFIX_PATH
.
What we could do is add some checks, e.g. with ldd
and otool
, to verify that the only external libraries spack-built libs and executables depend on are system ones like glibc and libstdc++. Getting that right is tricky, but it would allow us to verify that weird BLASes and such have not been found.
SPACK (or any other ecosystem meta-build tool) can't guarantee that if one of the downstream packages B or C does not allow the user to pass in the exact installed version of A and disable auto-finds of A that can do crazy things on some systems.
True, although one thing we've been considering is actually sandboxing builds. Gentoo does this using user namespaces, and in that universe you really can make it so that build will never find system libraries. I would like to transition to a cleaner build environment like that eventually, but user namespace support is not yet as broad in HPC as I'd like, so at the moment we can't rely on it.
A more modern way to include a TPL in a CMake build system is to use the
find_package
command in config mode so that it looks for the TPLs*.cmake
configuration files and then loads an exported target rather than variables like<package>_LIBRARIES
and<package>_INCLUDE_DIRS
. Thus, it becomes unnecessary to haveTPL_<package>_LIBRARIES
andTPL_<package>_INCLUDE_DIRS
.Additionally, when using
find_package
in this way, you can provide a path to the directory where the TPL's config files are located. The natural way to expose this as build option isTPL_<package>_DIR
.The XSDK community installation policy 10 does not seem to allow for this more modern CMake setup, and even appears to recommend against it.
My guess is that when the policies were written, this setup was not considered because it was uncommon.
With that said, would a XSDK package be considered non-compliant for doing things this way? Either way, I think we should consider updating the installation policies to clearly allow this as it is considered a CMake best practice.
CC @gardner48 @cswoodward