mir-group / phoebe

A high-performance framework for solving phonon and electron Boltzmann equations
https://mir-group.github.io/phoebe/
MIT License
85 stars 19 forks source link

building phoebe on Irene supercomputer #214

Closed asubedi closed 5 months ago

asubedi commented 5 months ago

I am trying to build phoebe in Irene supercomputer, which has restricted internet access so that git clone does not work.

Is there a way to build phoebe without internet access?

I downloaded the zipped version of the repos, unzipped them to googletest, spglib_src, etc., and commented out the git commands in lib/CMakeLists.txt. But that did not work.

jcoulter12 commented 5 months ago

Hi @asubedi,

I'm thinking about the best way to do this and will get back to you shortly. Off the top of my head, the two options would be for us to either a) discuss changes to the cmake file based on what you've tried, or b) try to make a container.

Have you been able to use a container (like Docker) before on this cluster? Thanks, Jenny

asubedi commented 5 months ago

Hi Jenny,

It seems like unzipping the required packages with appropriate structure is ok. The problem was I was trying to compile the develop version, which does not seem to be compiling.

So I am now compiling the master version. The configuration process now finished smoothly. But I hit a snag with the following error:

CMake Error at cmake_install.cmake:74 (file):
  file INSTALL cannot copy file
  "/ccc/work/cont003/gen11099/subedial/softwares/phoebe-master-9jun2024/build/highfive_dep-prefix/src/highfive_dep-build/include/highfive/H5Version.hpp"
  to
  "/ccc/work/cont003/gen11099/subedial/softwares/phoebe-master-9jun2024/build/highfive_src/include/highfive/H5Version.hpp":
  Disk quota exceeded.

make[3]: *** [Makefile:71: install] Error 1
make[2]: *** [CMakeFiles/highfive_dep.dir/build.make:103: highfive_dep-prefix/src/highfive_dep-stamp/highfive_dep-install] Error 2
make[1]: *** [CMakeFiles/Makefile2:404: CMakeFiles/highfive_dep.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

The problem is that the Irene supercomputer requires setgid bit in all the directories. However:

$ ls -rlth /ccc/work/cont003/gen11099/subedial/softwares/phoebe-master-9jun2024/build/highfive_src/
total 60K
-rw-r----- 1 subedial gen11099    6 Jun  9 14:29 VERSION
drwxr-s--- 5 subedial gen11099 4.0K Jun  9 14:29 tests
drwxr-s--- 3 subedial gen11099 4.0K Jun  9 14:29 src
-rw-r----- 1 subedial gen11099 7.1K Jun  9 14:29 README.md
-rw-r----- 1 subedial gen11099 1.4K Jun  9 14:29 LICENSE
drwxr-xr-x 3 subedial gen11099 4.0K Jun  9 14:29 include
drwxr-s--- 2 subedial gen11099 4.0K Jun  9 14:29 doc
drwxr-s--- 3 subedial gen11099 4.0K Jun  9 14:29 deps
-rw-r----- 1 subedial gen11099  464 Jun  9 14:29 default.nix
-rw-r----- 1 subedial gen11099 3.9K Jun  9 14:29 CMakeLists.txt
drwxr-s--- 4 subedial gen11099 4.0K Jun  9 14:29 CMake
-rw-r----- 1 subedial gen11099 6.0K Jun  9 14:29 CHANGELOG.md
drwxr-s--- 3 subedial gen11099 4.0K Jun  9 14:29 share

For some reason include does not have setgid bit. I am now asking asking ChatGPT how to fix this...

Thanks, Alaska

jcoulter12 commented 5 months ago

Hi Alaska,

I would absolutely not use master, it's 636 commits behind develop. It will be slower, have fewer features, and possibly bugs that have since been fixed -- hence why our installation suggests using the develop branch. :) In fact, this is a good reminder for me to remove that branch entirely.

Why is it that develop does not compile? Of course I use it across several machines presently, so I think it should be building -- but if there's actually a problem with it, then I want to know so I can fix that.

Best, Jenny

asubedi commented 5 months ago

Hi Jenny,

Below are the steps that I am using to compile the develop branch:

$ module load gnu/13.2.0 openblas/0.3.23 mpi/openmpi/4.1.5.3 flavor/hdf5/parallel hdf5/1.14.3 cmake/3.26.4
$ git clone --recurse-submodules https://github.com/mir-group/phoebe.git  phoebe-develop-9jun2024 
$ cd phoebe-develop-9jun2024
$ mkdir build
$ cd build
$ cmake .. 
...
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - not found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Found BLAS: /ccc/products/openblas-0.3.23/gcc--11.1.0/default/lib/libopenblas.so  
-- Looking for Fortran cheev
-- Looking for Fortran cheev - found
-- Found LAPACK: /ccc/products/openblas-0.3.23/gcc--11.1.0/default/lib/libopenblas.so;-lpthread;-lm;-ldl  
-- Found HDF5: /ccc/products/hdf5-1.14.3/gcc--11.1.0__openmpi--4.0.1/parallel/lib/libhdf5.so;/ccc/products/hdf5-1.14.3/gcc--11.1.0__openmpi--4.0.1/parallel/lib/libhdf5.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;/usr/lib64/libm.so;/usr/lib64/libpthread.so (found version "1.14.3")  
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.14") found components: doxygen dot 
-- Could NOT find Sphinx (missing: SPHINX_EXECUTABLE) 
Doxygen configured
Documentation needs also sphinx
-- Configuring incomplete, errors occurred!
See also "/ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/CMakeFiles/CMakeOutput.log".
See also "/ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/CMakeFiles/CMakeError.log".

I have attached the CMakeOutput.log, CMakeError.log, and CMakeCache.txt files.

CMakeCache.txt CMakeOutput.log CMakeError.log

It looks like it has problem finding sgemm. But the configuration process says that it has found sgemm.

I've tried compiling with intel compilers and mpi. I get the same issue.

Thanks a lot for you help.

Best, Alaska

PS. I've managed to use git through ssh. So restricted internet access is no longer an issue.

jcoulter12 commented 5 months ago

Ah great to hear you resolved the internet issue!

Can you also send me whatever else was printed by CMake to the command line, the part above the section you have sent? I find often you can get an answer quickly from that block of print statements (especially in terms of which compilers it found).

Also probably you are already doing this, but can you confirm you are removing the CMakeCache files or even just the entire build directory after making changes to the environment/modules?

Thanks, Jenny

asubedi commented 5 months ago

I have attached the cmake_output.txt obtained using the following steps:

[subedial@irene195 build]$ rm -rf   *
[subedial@irene195 build]$ cmake .. 2>&1 | tee cmake_output.txt

cmake_output.txt

Best, Alaska

jcoulter12 commented 5 months ago

I suspect it's this message, but let's see:

CMake Error at CMakeLists.txt:11 (cmake_policy):
  Policy "CMP0144" is not known to this version of CMake.

Can you open phoebe/CMakeLists.txt and delete the line:

cmake_policy(SET CMP0144 NEW)

A few days ago noticed this line (which earlier suppressed warnings) became an issue for new versions of CMake. I already have a PR staged to remove it anyway, so this might be the source of the issue.

Let me know what happens, Jenny

asubedi commented 5 months ago

That's it! It works.

But I still have the problem regarding setgid bit mentioned above:

[ 21%] Performing build step for 'highfive_dep'
[ 21%] Performing install step for 'highfive_dep'
Install the project...
-- Install configuration: "Release"
-- Installing: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/share/HighFive/CMake/HighFiveTargetDeps.cmake
-- Installing: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/share/HighFive/CMake/HighFiveConfig.cmake
-- Installing: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/share/HighFive/CMake/HighFiveConfigVersion.cmake
-- Installing: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/share/HighFive/CMake/HighFiveTargets.cmake
-- Up-to-date: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include
-- Up-to-date: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include/highfive
-- Installing: /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include/highfive/H5Version.hpp
CMake Error at cmake_install.cmake:74 (file):
  file INSTALL cannot copy file
  "/ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_dep-prefix/src/highfive_dep-build/include/highfive/H5Version.hpp"
  to
  "/ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include/highfive/H5Version.hpp":
  Disk quota exceeded.

make[3]: *** [Makefile:71: install] Error 1
make[2]: *** [CMakeFiles/highfive_dep.dir/build.make:103: highfive_dep-prefix/src/highfive_dep-stamp/highfive_dep-install] Error 2
make[1]: *** [CMakeFiles/Makefile2:459: CMakeFiles/highfive_dep.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

To recap:

$ ls -rlthd /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include/
drwxr-xr-x 3 subedial gen11099 4.0K Jun  9 17:37 /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include/

should have the permissions drwxr-sr-x.

Thanks, Alaska

jcoulter12 commented 5 months ago

What do you make of the message:

Disk quota exceeded.

In the output you just sent? Is it possible you're running out of space for the build?

asubedi commented 5 months ago

It's the policy of Irene sysadmins to have gidbit set for all the directories. Otherwise, I get the error. I can copy the file to, for example, tests subfolder without any problem.

Here's an old email I got from the sysadmins explaining the policy:

BEWARE when you will copy datas from your local network :
Directories lost sometimes rights "g+s"  (real name "gidbit"), usually in data transfers in the project space either by cp, scp or rsync.
Then, directories and files generates generated below will be associated to your connection group (organization group) 'cpht' and you will get the message "quota exceeded" because this group has quotas locked on Lustre.

So it's important to keep the right "s" on all directories in genXXXX/genXXXX et genXXXX/$login
The first solution is to use rsync (release v3.0.9 and after) with parameters -chmod=Dg+s and -chown=:genXXXX  (in case of a transfer in datadir genxxxx).
A second solution is to apply g+s rights on the source tree, before the transfer  with command scp -r without option -p

Thank you for your understanding

I don't know why only include does not have gidbit set. I am trying to figure out how to add chmod g+s /ccc/work/cont003/gen11099/subedial/softwares/phoebe-develop-9jun2024/build/highfive_src/include in the cmake files.

Best, Alaska

asubedi commented 5 months ago

Finally fixed it with the help of Claude.

I had to create a patch for build/highfive_src/CMakeLists.txt:

--- CMakeLists.txt      2024-06-09 23:52:07.000000000 +0200
+++ CMakeLists.txt.new  2024-06-10 00:44:34.000000000 +0200
@@ -20,6 +20,30 @@
 include(CheckCXXStandardSupport)
 include(BlueGenePortability)

+# Custom function to set setgid on all directories
+function(install_with_setgid)
+  cmake_parse_arguments(ARGS "" "DESTINATION" "FILES" ${ARGN})
+  
+  file(INSTALL DESTINATION "${ARGS_DESTINATION}" TYPE DIRECTORY FILES ${ARGS_FILES}
+       DIRECTORY_PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE
+                            GROUP_READ GROUP_WRITE GROUP_EXECUTE SETGID
+                            WORLD_READ WORLD_EXECUTE
+       FILE_PERMISSIONS OWNER_READ OWNER_WRITE
+                       GROUP_READ GROUP_WRITE
+                       WORLD_READ
+       PATTERN "*.in" EXCLUDE)
+  
+  file(GLOB_RECURSE ALL_DIRS "${ARGS_DESTINATION}/*")
+  foreach(dir ${ALL_DIRS})
+    if(IS_DIRECTORY "${dir}")
+      file(CHMOD "${dir}" DIRECTORY_PERMISSIONS
+           OWNER_READ OWNER_WRITE OWNER_EXECUTE
+           GROUP_READ GROUP_WRITE GROUP_EXECUTE SETGID
+           WORLD_READ WORLD_EXECUTE)
+    endif()
+  endforeach()
+endfunction()
+
 # OPTIONS
 # Compat within Highfive 2.x series
 set(USE_BOOST ON CACHE BOOL "Enable Boost Support")
@@ -70,13 +94,12 @@
 include(${PROJECT_SOURCE_DIR}/CMake/HighFiveTargetExport.cmake)

 # Installation of headers (HighFive is only interface)
-install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include/
-  DESTINATION "include"
-  PATTERN "*.in" EXCLUDE)
+install_with_setgid(DESTINATION "include"
+                    FILES ${CMAKE_CURRENT_SOURCE_DIR}/include/)

 # Installation of configured headers
-install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/include/
-  DESTINATION "include")
+install_with_setgid(DESTINATION "include"
+                    FILES ${CMAKE_CURRENT_BINARY_DIR}/include/)

 # Preparing local building (tests, examples)

Then modify lib/CMakeLists.txt:

...
  ExternalProject_Add(highfive_dep
      SOURCE_DIR ${CMAKE_CURRENT_BINARY_DIR}/highfive_src
      GIT_REPOSITORY https://github.com/anjohan/HighFive# https://github.com/BlueBrain/HighFive.git
      UPDATE_COMMAND ""
      PATCH_COMMAND patch -p0 < ${CMAKE_CURRENT_SOURCE_DIR}/highfive_setgid.patch
      CMAKE_ARGS
...

This seems to be a peculiarity of Irene and/or HighFive. I don't think it's worth including the changes in your repo.

Thanks for you help! I'll be trying out the code in the coming days.

Best, Alaska

jcoulter12 commented 5 months ago

Hi Alaska,

Yay, glad you were able to get it working on this machine! And, thanks for reporting the fix, in case such a thing ever comes up again.

Indeed, if you have other questions, please feel free to ask. Best, Jenny