Open djcole56 opened 4 years ago
Hi there,
There have been some recent updates to Sire to enable builds on ppc64le architectures, see this pull request for details. I assume that this would work for building on ppc64 too. Specifically, there are updates to deal with getting CPU info where cpuid isn't supported:
corelib/src/libs/SireBase/cpuid.cpp: added support for getting the number of CPUs with native platform-specific methods in the absence of libcpuid
This was included in the recent 2020.1.0 release of Sire. Since it looks like you are using 2019.3.0, could you possibly try building using the development branch which will be up to date. (Remember to delete any existing ~/sire.app
, build/corelib
and build/wrapper
directories and the build/miniconda.sh
installer.) Also, are you building using the compile_sire.sh
script? Above it looks like you are running the Makefile for corelib directly, but perhaps you are doing this to show the truncated error output.
Just to note that I haven't actually built Sire on ppc64le myself. The pull request was made by Cresset, so it would be interesting to know if it doesn't work on architectures other than those that they've tested it on. (I checked that it didn't break any of our existing builds for Linux and macOS.)
Cheers.
Hi,
Thanks, this sounds promising. I'm not building using compile_sire.sh. I was following the instructions in INSTALL_INTO_ANACONDA.rst - I think because I wanted to install into my own conda distribution where I have openMM installed, ie:
cmake -D ANACONDA_BUILD=on -D ANACONDA_BASE=$HOME/.conda/envs/openmm $HOME/openmm/Sire/corelib nice make -j 4
I'll keep playing, but unfortunately the first attempt gives a similar error:
(openmm) [ndc104@pn001 corelib]$ nice make -j 4
Scanning dependencies of target test_qhash_lookup
Scanning dependencies of target test_openmp
Scanning dependencies of target SireError
Scanning dependencies of target get_uname
[ 0%] Building C object src/apps/test_system/CMakeFiles/get_uname.dir/get_uname.c.o
[ 1%] Building CXX object build/test_compiler/test_qhash_lookup/CMakeFiles/test_qhash_lookup.dir/main.cpp.o
cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C
[ 1%] Building CXX object build/test_compiler/test_openmp/CMakeFiles/test_openmp.dir/main.cpp.o
[ 1%] Linking C executable get_uname
[ 1%] Built target get_uname
Scanning dependencies of target get_glibc_version
[ 1%] Building C object src/apps/test_system/CMakeFiles/get_glibc_version.dir/get_glibc_version.c.o
cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C
[ 1%] Linking C executable get_glibc_version
[ 1%] Built target get_glibc_version
Scanning dependencies of target get_cpuid
[ 2%] Building C object src/apps/test_system/CMakeFiles/get_cpuid.dir/get_cpuid.c.o
cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C
[ 2%] Linking C executable get_cpuid
/mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to cpu_rdtsc' /mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to
busy_sse_loop'
/mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to `exec_cpuid'
collect2: error: ld returned 1 exit status
make[2]: [src/apps/test_system/get_cpuid] Error 1
make[1]: [src/apps/test_system/CMakeFiles/get_cpuid.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Hmmm, I've not used the INSTALL_INTO_ANACONDA approach, and I'm not sure it's valid given the changes to the way we build Sire. (It's now a self-contained conda app with no external dependencies.) @chryswoods would have a better idea if this is still possible.
Using the standard installation approach (./compile_sire.sh
) it's trivial to change the installed version of OpenMM after Sire is built. (Just use ~/sire.app/bin/conda install -c omnia openmm=...
.) We also have a bundled script accessible at ~/sire.app/bin/optimise_openmm
which will try to figure out the most recent version that is compatible with your system, then install that for you.
Could you try the regular installation and see if that works? If not, then I can dig into it further.
Oh I see, yep no problem. Just seems to be a handful of unavailable packages now. At first glance some of these seem to be hard to get hold of for ppc64le via conda:
(openmm) [ndc104@pn001 Sire]$ ./compile_sire.sh Where would you like to install Sire? [/mnt/nfs/home/ndc104/sire.app]: Installing into directory '/mnt/nfs/home/ndc104/sire.app' Running the conda activate script... . "/mnt/nfs/home/ndc104/sire.app/bin/activate" Running the Python install script... "/mnt/nfs/home/ndc104/sire.app/bin/python" build/build_sire.py ** Compiling on Linux Number of cores used for compilation = 128 Continuing the Sire install using /mnt/nfs/home/ndc104/sire.app/bin/python build/build_sire.py pip is already installed... Activating conda-forge channel using: '/mnt/nfs/home/ndc104/sire.app/bin/conda config --prepend channels conda-forge' Warning: 'conda-forge' already in 'channels' list, moving to the top Installing packages using: '/mnt/nfs/home/ndc104/sire.app/bin/conda install --yes ipython pytest nose netcdf4=1.5.3 boost=1.72.0 gsl=2.6 tbb=2019.9 tbb-devel=2019.9 pyqt=5.12.3 gcc_linux-64 gxx_linux-64 make libtool autoconf automake cmake' Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
Interesting, thanks for the update. As I said, I've not installed on ppc64 myself. Perhaps @ptosco could comment, since he submitted the pull request for ppc64le support. It doesn't look like any conda dependencies were updated in the build script, so perhaps it's a case of manually installing the missing packages from source before building. It looks like netcdf4
is available for ppc64le if you use version 1.4.2 instead. (Versions of conda dependencies are pinned in the build/build_sire.py script.)
Yes, agreed that it's probably a case of installing these manually. I'll see what I can do with system admin support, and let you know either way.
@djcole56 Hi Danny, correct, those packages are not available through conda
.
ppc64le
version of Qt5, as my ppc64le
HPC system was running on CentOS 7. I am sure you can find similar pre-built ppc64le
packages for other Linux distributions. Please note that you don't need the Python wrappers - the C++ libraries will be sufficient as Sire does not use PyQt.
As I had no root privileges I simply downloaded the Qt5 RPMs from centos.org
and then unpacked them with rpm2cpio <my.rpm> | cpio -idm
, and set CMake paths accordingly to point at the include
and lib64
dirs.netcdf4
this was available as a pre-built Lmod
module on my HPC system; if it is not available on your you may easily build it from source, or use a pre-built package from your distro.gcc
and g++
were available as Lmod
modules on my HPC system, otherwise you y get them from your Linux distro.
Feel free to get back to me if you have issues - I am confident that Sire 2020 will build also for you!Hi @ptosco, thanks very much for your earlier work and new advice. We had actually already installed Qt5 on the HPC, so I was confused that PyQt was missing. But if not needed, then it looks like we can ignore it. And I've enquired about the availability of the remaining modules. I'm confident we're nearly there!
Hi @djcole56, I was just wondering if there was any update on this? Did you manage to build Sire in the end?
Hi @lohedges, still making progress thanks. We've managed to use gcc and g++ from existing modules on the HPC, and just trying to get netcdf4 built on the same system. I don't see any further hurdles from the Sire side, so feel free to close this issue if you like, and I'll open a new one if I get stuck again. Thanks!
Hi. We just installed it and it looks like there is still a small issue with the CPUID. It checks for Power9: https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/CMakeLists.txt#L950-L961
but only if SIRE_FOUND_CPUID
is False. However, at that point it is True because cpuid is being bundled:
https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/src/bundled/install_cpuid.cmake#L149
Can the bundling be omitted completely on Power9? Thanks.
Yes, no problem. CPUID is an optional dependency so there's no issue with disabling it. I'll fix the CMake logic this afternoon.
Cheers.
On Wed, 26 May 2021, 22:21 Mateusz Bieniek, @.***> wrote:
Hi. We just installed it and it looks like there is still a small issue with the CPUID. It checks for Power9: https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/CMakeLists.txt#L950-L961 but only if SIRE_FOUND_CPUID is False. However, at that point it is True because cpuid is being bundled:
Can the bundling be omitted completely on Power9? Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/michellab/Sire/issues/320#issuecomment-849127544, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3KPM7JLM566GFILCOTTPVQ67ANCNFSM4PXPYYOA .
I've just pushed a fix, which I've tested locally by checking if CMAKE_HOST_SYSTEM_PROCESSOR
is equal to x86_64
, rather than ppc64le
. Note that you'll need to clear your CMake cache if you are pulling the update and rebuilding in the same directory. It's probably easiest to simply remove the build/corelib
directory and re-run ./compile_sire.sh
.
Let me know if you run into any other issues.
Thanks, I confirm that the fix removed the problem with libcpuid on Power9.
We found the other issue we were struggling with. It's to do with the ABI compatibility. Specifically, the OpenMM (7.4.2) that we have access to and that we compiled uses ABI with CXX11.
Specifically, we use conda install -c omnia-dev/label/cuda101 openmm
which was compiled with GCC 8.2 and I believe used CXX11 ABI. The check I used for this is nm ./lib/libOpenMM.so | grep -i CXX11
In order to remove our linking issue I simply removed the compatibility ABI flat -D_GLIBCXX_USE_CXX11_ABI=0
:
# Now gcc 5 specific options
if ( GCC_MAJOR_VERSION GREATER 4 )
if (MSYS)
message(STATUS "MSYS2 will use builtin OpenMM if available...")
else()
# OpenMM with conda uses the old C++ binary API!
# Tell GCC 5 to respect the old API
set( SIRE_PLATFORM_FLAGS "${SIRE_PLATFORM_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" )
endif()
endif()
The quick minimisation/tests with somd-freenrg
appear to be running fine now.
I do not see CXX11 in the openmm installed on x86_64 machine
Hmmm, interesting. I didn't add that compiler flag, but was under the impression that the Omnia package used the old ABI, whereas the new conda-forge package uses the new ABI. As you say, there's no mention of CX11 when running nm on the Linux so, so perhaps this fix is now redundant for the Omnia build. I'll try removing it and rebuilding when I get a chance. (Perhaps older versions of OpenMM did require this fix.)
I've just downloaded the 7.4.2 python 3.7 from omnia as the build_sire.py
does and nm
shows no cxx11 (https://anaconda.org/omnia/openmm/files). So that makes sense that you correct for it.
However, in the version from the omnia-dev 7.4.0 I have the cxx11 is present. That is the openmm-7.4.0-py37_cuda101_1.tar (https://anaconda.org/omnia-dev/openmm/files?version=7.4.0).
Saying that, all conda-force appears to have cxx11. The new release for ppc64le, py39 (https://twitter.com/openmm_toolkit/status/1400859263157874695) has a lot of cxx11. Similarly for linux-64 I also find cxx11 in the binaries.
So it seems it is more about our binaries as well as conda-force.
Thanks, Mat
Yes, we patch for the conda-forge build, so could do the same for ppc64le if needed.
In that case I think it's best to ignore it then. Cheers
Hi,
The N8CIR will shortly be purchasing several Power9 GPU nodes: https://n8cir.org.uk/supporting-research/facilities/nice/
We have access to a node in Newcastle at the moment, and I've managed to install OpenMM following the instructions here: https://github.com/inspiremd/conda-recipes-summit#installing-on-summit
I've also started to have a look at building Sire, but have got stuck on compiling the corelib (errors below).
I can provide full build details, but just thought I'd check that what I'm trying is at all feasible?
Thanks, Danny
(openmm) [ndc104@pn001 corelib]$ nice make -j 4 [ 1%] Built target test_qhash_lookup [ 1%] Built target get_uname [ 1%] Built target test_openmp [ 1%] Built target get_glibc_version [ 1%] Linking C executable get_cpuid /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'busy_sse_loop' /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'exec_cpuid' /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'cpu_rdtsc' collect2: error: ld returned 1 exit status make[2]: [src/apps/test_system/get_cpuid] Error 1 make[1]: [src/apps/test_system/CMakeFiles/get_cpuid.dir/all] Error 2 make[1]: Waiting for unfinished jobs.... [ 2%] Built target SireError make: [all] Error 2