michellab / Sire

Sire Molecular Simulations Framework
http://siremol.org
GNU General Public License v3.0
95 stars 26 forks source link

Building Sire on Power9 architecture #320

Open djcole56 opened 4 years ago

djcole56 commented 4 years ago

Hi,

The N8CIR will shortly be purchasing several Power9 GPU nodes: https://n8cir.org.uk/supporting-research/facilities/nice/

We have access to a node in Newcastle at the moment, and I've managed to install OpenMM following the instructions here: https://github.com/inspiremd/conda-recipes-summit#installing-on-summit

I've also started to have a look at building Sire, but have got stuck on compiling the corelib (errors below).

I can provide full build details, but just thought I'd check that what I'm trying is at all feasible?

Thanks, Danny

(openmm) [ndc104@pn001 corelib]$ nice make -j 4 [ 1%] Built target test_qhash_lookup [ 1%] Built target get_uname [ 1%] Built target test_openmp [ 1%] Built target get_glibc_version [ 1%] Linking C executable get_cpuid /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'busy_sse_loop' /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'exec_cpuid' /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'cpu_rdtsc' collect2: error: ld returned 1 exit status make[2]: [src/apps/test_system/get_cpuid] Error 1 make[1]: [src/apps/test_system/CMakeFiles/get_cpuid.dir/all] Error 2 make[1]: Waiting for unfinished jobs.... [ 2%] Built target SireError make: [all] Error 2

lohedges commented 4 years ago

Hi there,

There have been some recent updates to Sire to enable builds on ppc64le architectures, see this pull request for details. I assume that this would work for building on ppc64 too. Specifically, there are updates to deal with getting CPU info where cpuid isn't supported:

corelib/src/libs/SireBase/cpuid.cpp: added support for getting the number of CPUs with native platform-specific methods in the absence of libcpuid

This was included in the recent 2020.1.0 release of Sire. Since it looks like you are using 2019.3.0, could you possibly try building using the development branch which will be up to date. (Remember to delete any existing ~/sire.app, build/corelib and build/wrapper directories and the build/miniconda.sh installer.) Also, are you building using the compile_sire.sh script? Above it looks like you are running the Makefile for corelib directly, but perhaps you are doing this to show the truncated error output.

Just to note that I haven't actually built Sire on ppc64le myself. The pull request was made by Cresset, so it would be interesting to know if it doesn't work on architectures other than those that they've tested it on. (I checked that it didn't break any of our existing builds for Linux and macOS.)

Cheers.

djcole56 commented 4 years ago

Hi,

Thanks, this sounds promising. I'm not building using compile_sire.sh. I was following the instructions in INSTALL_INTO_ANACONDA.rst - I think because I wanted to install into my own conda distribution where I have openMM installed, ie:

cmake -D ANACONDA_BUILD=on -D ANACONDA_BASE=$HOME/.conda/envs/openmm $HOME/openmm/Sire/corelib nice make -j 4

I'll keep playing, but unfortunately the first attempt gives a similar error:

(openmm) [ndc104@pn001 corelib]$ nice make -j 4 Scanning dependencies of target test_qhash_lookup Scanning dependencies of target test_openmp Scanning dependencies of target SireError Scanning dependencies of target get_uname [ 0%] Building C object src/apps/test_system/CMakeFiles/get_uname.dir/get_uname.c.o [ 1%] Building CXX object build/test_compiler/test_qhash_lookup/CMakeFiles/test_qhash_lookup.dir/main.cpp.o cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C [ 1%] Building CXX object build/test_compiler/test_openmp/CMakeFiles/test_openmp.dir/main.cpp.o [ 1%] Linking C executable get_uname [ 1%] Built target get_uname Scanning dependencies of target get_glibc_version [ 1%] Building C object src/apps/test_system/CMakeFiles/get_glibc_version.dir/get_glibc_version.c.o cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C [ 1%] Linking C executable get_glibc_version [ 1%] Built target get_glibc_version Scanning dependencies of target get_cpuid [ 2%] Building C object src/apps/test_system/CMakeFiles/get_cpuid.dir/get_cpuid.c.o cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C [ 2%] Linking C executable get_cpuid /mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to cpu_rdtsc' /mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference tobusy_sse_loop' /mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to `exec_cpuid' collect2: error: ld returned 1 exit status make[2]: [src/apps/test_system/get_cpuid] Error 1 make[1]: [src/apps/test_system/CMakeFiles/get_cpuid.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs....

lohedges commented 4 years ago

Hmmm, I've not used the INSTALL_INTO_ANACONDA approach, and I'm not sure it's valid given the changes to the way we build Sire. (It's now a self-contained conda app with no external dependencies.) @chryswoods would have a better idea if this is still possible.

Using the standard installation approach (./compile_sire.sh) it's trivial to change the installed version of OpenMM after Sire is built. (Just use ~/sire.app/bin/conda install -c omnia openmm=....) We also have a bundled script accessible at ~/sire.app/bin/optimise_openmm which will try to figure out the most recent version that is compatible with your system, then install that for you.

Could you try the regular installation and see if that works? If not, then I can dig into it further.

djcole56 commented 4 years ago

Oh I see, yep no problem. Just seems to be a handful of unavailable packages now. At first glance some of these seem to be hard to get hold of for ppc64le via conda:

(openmm) [ndc104@pn001 Sire]$ ./compile_sire.sh Where would you like to install Sire? [/mnt/nfs/home/ndc104/sire.app]: Installing into directory '/mnt/nfs/home/ndc104/sire.app' Running the conda activate script... . "/mnt/nfs/home/ndc104/sire.app/bin/activate" Running the Python install script... "/mnt/nfs/home/ndc104/sire.app/bin/python" build/build_sire.py ** Compiling on Linux Number of cores used for compilation = 128 Continuing the Sire install using /mnt/nfs/home/ndc104/sire.app/bin/python build/build_sire.py pip is already installed... Activating conda-forge channel using: '/mnt/nfs/home/ndc104/sire.app/bin/conda config --prepend channels conda-forge' Warning: 'conda-forge' already in 'channels' list, moving to the top Installing packages using: '/mnt/nfs/home/ndc104/sire.app/bin/conda install --yes ipython pytest nose netcdf4=1.5.3 boost=1.72.0 gsl=2.6 tbb=2019.9 tbb-devel=2019.9 pyqt=5.12.3 gcc_linux-64 gxx_linux-64 make libtool autoconf automake cmake' Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

lohedges commented 4 years ago

Interesting, thanks for the update. As I said, I've not installed on ppc64 myself. Perhaps @ptosco could comment, since he submitted the pull request for ppc64le support. It doesn't look like any conda dependencies were updated in the build script, so perhaps it's a case of manually installing the missing packages from source before building. It looks like netcdf4 is available for ppc64le if you use version 1.4.2 instead. (Versions of conda dependencies are pinned in the build/build_sire.py script.)

djcole56 commented 4 years ago

Yes, agreed that it's probably a case of installing these manually. I'll see what I can do with system admin support, and let you know either way.

ptosco commented 4 years ago

@djcole56 Hi Danny, correct, those packages are not available through conda.

djcole56 commented 4 years ago

Hi @ptosco, thanks very much for your earlier work and new advice. We had actually already installed Qt5 on the HPC, so I was confused that PyQt was missing. But if not needed, then it looks like we can ignore it. And I've enquired about the availability of the remaining modules. I'm confident we're nearly there!

lohedges commented 4 years ago

Hi @djcole56, I was just wondering if there was any update on this? Did you manage to build Sire in the end?

djcole56 commented 4 years ago

Hi @lohedges, still making progress thanks. We've managed to use gcc and g++ from existing modules on the HPC, and just trying to get netcdf4 built on the same system. I don't see any further hurdles from the Sire side, so feel free to close this issue if you like, and I'll open a new one if I get stuck again. Thanks!

bieniekmateusz commented 3 years ago

Hi. We just installed it and it looks like there is still a small issue with the CPUID. It checks for Power9: https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/CMakeLists.txt#L950-L961 but only if SIRE_FOUND_CPUID is False. However, at that point it is True because cpuid is being bundled: https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/src/bundled/install_cpuid.cmake#L149

Can the bundling be omitted completely on Power9? Thanks.

lohedges commented 3 years ago

Yes, no problem. CPUID is an optional dependency so there's no issue with disabling it. I'll fix the CMake logic this afternoon.

Cheers.

On Wed, 26 May 2021, 22:21 Mateusz Bieniek, @.***> wrote:

Hi. We just installed it and it looks like there is still a small issue with the CPUID. It checks for Power9: https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/CMakeLists.txt#L950-L961 but only if SIRE_FOUND_CPUID is False. However, at that point it is True because cpuid is being bundled:

https://github.com/michellab/Sire/blob/a9f32a6448aa0ccd34debc961a299b29697e67ae/corelib/src/bundled/install_cpuid.cmake#L149

Can the bundling be omitted completely on Power9? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/michellab/Sire/issues/320#issuecomment-849127544, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3KPM7JLM566GFILCOTTPVQ67ANCNFSM4PXPYYOA .

lohedges commented 3 years ago

I've just pushed a fix, which I've tested locally by checking if CMAKE_HOST_SYSTEM_PROCESSOR is equal to x86_64, rather than ppc64le. Note that you'll need to clear your CMake cache if you are pulling the update and rebuilding in the same directory. It's probably easiest to simply remove the build/corelib directory and re-run ./compile_sire.sh.

Let me know if you run into any other issues.

bieniekmateusz commented 3 years ago

Thanks, I confirm that the fix removed the problem with libcpuid on Power9.

We found the other issue we were struggling with. It's to do with the ABI compatibility. Specifically, the OpenMM (7.4.2) that we have access to and that we compiled uses ABI with CXX11.

Specifically, we use conda install -c omnia-dev/label/cuda101 openmm which was compiled with GCC 8.2 and I believe used CXX11 ABI. The check I used for this is nm ./lib/libOpenMM.so | grep -i CXX11

In order to remove our linking issue I simply removed the compatibility ABI flat -D_GLIBCXX_USE_CXX11_ABI=0:

    # Now gcc 5 specific options
    if ( GCC_MAJOR_VERSION GREATER 4 )
      if (MSYS)
        message(STATUS "MSYS2 will use builtin OpenMM if available...")
      else()
        # OpenMM with conda uses the old C++ binary API!
        # Tell GCC 5 to respect the old API
        set( SIRE_PLATFORM_FLAGS "${SIRE_PLATFORM_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" )
      endif()
    endif()

The quick minimisation/tests with somd-freenrg appear to be running fine now.

I do not see CXX11 in the openmm installed on x86_64 machine

lohedges commented 3 years ago

Hmmm, interesting. I didn't add that compiler flag, but was under the impression that the Omnia package used the old ABI, whereas the new conda-forge package uses the new ABI. As you say, there's no mention of CX11 when running nm on the Linux so, so perhaps this fix is now redundant for the Omnia build. I'll try removing it and rebuilding when I get a chance. (Perhaps older versions of OpenMM did require this fix.)

bieniekmateusz commented 3 years ago

I've just downloaded the 7.4.2 python 3.7 from omnia as the build_sire.py does and nm shows no cxx11 (https://anaconda.org/omnia/openmm/files). So that makes sense that you correct for it.

However, in the version from the omnia-dev 7.4.0 I have the cxx11 is present. That is the openmm-7.4.0-py37_cuda101_1.tar (https://anaconda.org/omnia-dev/openmm/files?version=7.4.0).

Saying that, all conda-force appears to have cxx11. The new release for ppc64le, py39 (https://twitter.com/openmm_toolkit/status/1400859263157874695) has a lot of cxx11. Similarly for linux-64 I also find cxx11 in the binaries.

So it seems it is more about our binaries as well as conda-force.

Thanks, Mat

lohedges commented 3 years ago

Yes, we patch for the conda-forge build, so could do the same for ppc64le if needed.

bieniekmateusz commented 3 years ago

In that case I think it's best to ignore it then. Cheers