sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
246 stars 56 forks source link

Defect: install.sh dies on HPCLinux #384

Closed rouson closed 7 years ago

rouson commented 7 years ago
Avg response time
Issue Stats

Installation/build problem

./install.sh fails inside a Virtuabox virtual machine booting the current version of HPCLinux and attempting to build OpenCoarrays 1.8.11 (the current master branch) dies when it invokes the stack functions in prerequisites/stack.sh. To investigate the underlying cause, I ran test-stack.sh and have found that the missing_variable_name, duplicate_stack_creation, and verify_stack_size_changes tests all cause test-stack.sh to die.

$ uname -a
Linux localhost.localdomain 3.6.11-4.fc16.x86_64 #1 SMP Tue Jan 8 20:57:42 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ git describe
1.8.10-18-gc5586a1

HPCLinux is apparently too old to have linuxbrew: sudo yum install linuxbrew returns No package linuxbrew available.

Workaround

Because install.sh invokes build.sh and the latter doesn't use stack.sh, a workaround is to build prerequisites using build.sh directly. HPCLinux has with GCC 4.8 pre-installed so the first step is to build GCC 6.3.0. I'll add the remaining steps as I progress.

cd prerequisites
./build.sh --package gcc --install-version 6.3.0

Please select one from the following top level items (please be sure to fill out all sub-items):

zbeekman commented 7 years ago

Is there any reason to believe that the "Lite" image would yield a different result? All images are huge, so I'm going to investigate with the "Lite" image, which should download/boot faster and hog less space.

rouson commented 7 years ago

@zbeekman Sameer directed me to the OVA file for the full version, but I don't know the reasons. I suspect that the problem I encountered might relate to HPCLinux being dated. Based on a couple of different pieces of circumstantial evidence, I'm guessing HPCLinux is based on a Fedora version from 2012 (Sameer can confirm). For that reason, I recommend not putting too much energy into this. The biggest help would be if you could figure out a way that install.sh can detect that it's running on HPCLinux. Then it can simply invoke the new hpclinux-install.sh I'm creating. Because HPCLinux is uber-stable, the stack-based system interrogation that install.sh uses is unnecessary so the new script will just invoke build.sh unconditionally and skip all the system interrogation.

The new script will capture the steps I'm going through interactively right now (I'm currently on the cmake build):

pushd prerequisites

  # Build CMake using the system GCC (4.8.1) and prepend its bin subdirectory to the PATH
  export cmake_install_path="${PWD}"/installations/cmake/cmake
  ./build.sh --package cmake --install-prefix "${cmake_install_path}"
  if [[ -z "${PATH}" ]]; then
    export PATH="${cmake_install_path}"/bin
  else
    export PATH="${cmake_install_path}"/bin:$PATH
  fi

  # Build GCC 6.3.0 and prepend its bin subdirectory to the PATH
  export gcc_version=6.3.0
  export gcc_install_path="${PWD}"/installations/gnu/$gcc_version
  ./build.sh --package gcc --install-version $gcc_version --install-prefix "${gcc_install_path}"
  export PATH="${gcc_install_path}"/bin:$PATH
  export LD_LIBRARY_PATH="${gcc_install_path}"/lib64:"${gcc_install_path}"/lib:$LD_LIBRARY_PATH

  # Build MPICH 3.2 and prepend its bin subdirectory to the PATH
  export mpich_install_path="${PWD}"/installations/mpich
  ./build.sh --package mpich --install-prefix "${mpich_install_path}" --num-threads 4
  export PATH="${mpich_install_path}"/bin:$PATH

popd # return to top level of OpenCoarrays source tree

# Build OpenCoarrays
if [[ -d build ]]; then
  rm -rf build
fi
mkdir build
pushd build

  export opencoarrays_install_path="${PWD}"/prerequisites/installations/opencoarrays
  FC=gfortran CC=gcc cmake .. -DCMAKE_INSTALL_PREFIX="{opencoarrays_install_path}"
  make
  make install
  export PATH="${opencoarrays_install_path}"/bin:$PATH

popd # return to top level of OpenCoarrays source tree
rouson commented 7 years ago

@zbeekman I made a few edits to the above draft script. Everything builds but two tests fail: allocate_as_barrier_proc and coarray_burgers_PDE. This was my second successful build. After the first successful build, only the latter of those two tests failed . During the cmake step, the following line appears

-- Performing Test MPI_Fortran_MODULE_COMPILES - FAILED

along with the message

It appears that MPI was built with a different Fortran compiler.  It is 
possible that this may cause unpredictable behavior.  The build will
continue using mpif.h BUT please report any suspicious behavior to the
OpenCoarrays developers.

I'll try building a new MPI instead with GCC 6.3.0 rather than the system MPI, which was built with GCC 4.8.

zbeekman commented 7 years ago

Yes, I'm not sure if that will really make a difference, but it could... The warning is just detecting that a different Fortran compiler build the MPI module file, and falling back on the mpif.h header via include. I'll be curious to see if a rebuilt MPI using a matching GCC/GFortran causes those errors to go away.

It's possible that there was a bug fix in GCC/GFortran 7 addressing allocate_as_barrier_proc, but I don't recall for certain. Strange that coarray_burgers_PDE is failing... Running ctest with --output-on-failure might yield some more insight.

rouson commented 7 years ago

I'm using GCC 6.3.0.

rouson commented 7 years ago

Both tests still fail. Now my HPCLinux set-up almost exactly matches what I have in the Sourcery Institute VM except for installation paths and except that my HPCLinux build uses MPICH 3.2 and my SI VM build uses MPICH 3.1.4. (All tests pass in the SI VM.) I'll submit a PR with the new script, but again, we shouldn't put too much time into this. I'm fine with having completed the build modulo two test failures. I still suspect this issue about the age of the upstream Fedora distribution.

rouson commented 7 years ago

OMG! After switching to MPICH 3.1.4, coarray_burgers_pde fails consistently but allocate_as_barrier_proc alternates between success and failure. Presumably it was an intermittent failure with MPICH 3.2.0, but I just happened to see failure. I'm done with this. I'll mark it as won't fix, but will still submit the pull request with the new script.

zbeekman commented 7 years ago

@rouson I'm just going through and cleaning up some stale branches... Did you ever submit a PR with the new script?