xsdk-project / xsdk-issues

A repository under which GitHub issues not related to a specific xSDK repo can be filed.
7 stars 0 forks source link

dealii build error on knl/intel #121

Closed balay closed 2 years ago

balay commented 4 years ago

https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/752293885

icpc: error #10105: ld.gold: core dumped
icpc: warning #10102: unknown signal(-999659024)
icpc: error #10106: Fatal error in ld.gold, terminated by unknown
make[2]: *** [lib/libdeal_II.g.so.9.2.0] Error 1

I guess I should reduce from '-j 48' to '-j 32' and retry. The build already takes 8h...

spack-build-out.txt

balay commented 4 years ago

-j 24 worked

https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/754455137

balay commented 4 years ago

I have a failure again. I'm not sure what change is triggering this. Attaching logs for now

spack-build-out.txt CMakeError.log CMakeOutput.log

balay commented 4 years ago

Tried reverting all the way to the following commit - and the build appears to go through now

https://github.com/spack/spack/commit/fe239c83fc9da1c93df265f50beef80d32c0e8ed deal.II: Further modernisation and improvements

bangerth commented 4 years ago

The error in question is this:

-- Include /home/xsdk/spack-ref/spack-stage/spack-stage-dealii-9.2.0-mwmq4jevo52bsho4bxvtq3btdzky5njb/spack-src/cmake/configure/configure_arpack.cmake
CMake Error at cmake/macros/macro_configure_feature.cmake:196 (MESSAGE):

  DEAL_II_WITH_ARPACK has unmet configuration requirements:
  DEAL_II_WITH_LAPACK has to be set to "ON".

This originates from this:

-- Include /home/xsdk/spack-ref/spack-stage/spack-stage-dealii-9.2.0-mwmq4jevo52bsho4bxvtq3btdzky5njb/spack-src/cmake/configure/configure_1_lapack.cmake
-- Performing Test DEAL_II_HAVE_FLAG_pthread
-- Performing Test DEAL_II_HAVE_FLAG_pthread - Failed
-- Performing Test LAPACK_SYMBOL_CHECK
-- Performing Test LAPACK_SYMBOL_CHECK - Failed
-- Could not find a sufficient BLAS/LAPACK installation: BLAS/LAPACK symbol check failed! Consult CMakeFiles/CMakeError.log for further information.
-- Performing Test MKL_SYMBOL_CHECK
-- Performing Test MKL_SYMBOL_CHECK - Failed
-- Use other than Intel MKL implementation of BLAS/LAPACK (consult CMakeFiles/CMakeError.log for further information).
-- DEAL_II_WITH_LAPACK has unmet external dependencies.

I can't figure out why this error happens in the CMakeError.log file.

@jppelteret: @balay traced this back to your patch. Do you see what the problem might be?

bangerth commented 4 years ago

Related to https://github.com/spack/spack/pull/19253

balay commented 4 years ago

Attaching the log from successful build spack-build-out.txt

One difference I see: -DLAPACK_LIBRARIES=libx;liby to -DLAPACK_LIBRARIES:STRING=libx liby

And same issue with scalapack. ok - trying the following with the latest develop change

diff --git a/var/spack/repos/builtin/packages/dealii/package.py b/var/spack/repos/builtin/packages/dealii/package.py
index 4a841b5aa..d21b28551 100644
--- a/var/spack/repos/builtin/packages/dealii/package.py
+++ b/var/spack/repos/builtin/packages/dealii/package.py
@@ -305,7 +305,7 @@ def cmake_args(self):
             self.define(
                 'LAPACK_INCLUDE_DIRS', lapack_blas_headers.directories
             ),
-            self.define('LAPACK_LIBRARIES', lapack_blas_libs),
+            self.define('LAPACK_LIBRARIES', lapack_blas_libs.joined(';')),
             self.define('UMFPACK_DIR', spec['suite-sparse'].prefix),
             self.define('ZLIB_DIR', spec['zlib'].prefix),
             self.define('DEAL_II_ALLOW_BUNDLED', False)
@@ -499,7 +499,7 @@ def cmake_args(self):
                 self.define(
                     'SCALAPACK_INCLUDE_DIRS', spec['scalapack'].prefix.include
                 ),
-                self.define('SCALAPACK_LIBRARIES', scalapack_libs)
+                self.define('SCALAPACK_LIBRARIES', scalapack_libs.joined(';'))
             ])

         # Open Cascade

ok - now the build is progressing (beyond cmake)

==> dealii: Executing phase: 'cmake'
==> dealii: Executing phase: 'build'

And I see the same change - i.e removal of .joined(';') with other libraries aswell]

jppelteret commented 4 years ago

Apologies, so it looks like I might have made a mistake when removing these join operations. I thought it was a bit risky to do so, but it built fine for me when I tested it with most of the bells and whistles enabled. I had been looking here at how this new spack syntax works, and was under the assumption that it would concatenate whatever is passed to it (since it seemed reasonable that some sort of list is returned here). This seems to be incorrect, so I'll open a PR that adds these operations back. Thanks @balay for doing the investigative work!

bangerth commented 4 years ago

@balay -- thanks for doing the detective work. Do you have what you need to move forward with the patch you have?

balay commented 4 years ago

Yes - I'll use what I have until the MR with fix is merged.

balay commented 4 years ago

I'm getting a build error on cori@nersc. [Don't know if its due to boost - this build worked on other test boxes]. Will try older boost and see how that goes.

spack-build-out.txt

balay commented 4 years ago

Still fails [after switching from boost@1.74.4 to boost@1.74.0]. @bangerth might need to take a look at this

spack-build-out.txt

balay commented 4 years ago
include/deal.II/opencascade/utilities.h(34): catastrophic error: cannot open source file "IFSelect_ReturnStatus.hxx" 

On a different box [without build failures]

balay@xsdk:/data/balay/spack-xsdk>!find
find -name IFSelect_ReturnStatus.hxx -print
./opt/spack/linux-centos7-skylake_avx512/gcc-7.4.0/oce-0.18.3-u2cxepysugof67chbtooosxkqjsa5hb4/include/oce/IFSelect_ReturnStatus.hxx
balay@xsdk:/data/balay/spack-xsdk>spack find -v oce
==> 1 installed package
-- linux-centos7-skylake_avx512 / gcc@7.4.0 ---------------------
oce@0.18.3~X11+tbb
balay@xsdk:/data/balay/spack-xsdk>

And on cori

balay@cori08:~/spack> find . -name IFSelect_ReturnStatus.hxx
balay@cori08:~/spack> ./bin/spack find -v oce
==> 1 installed package
-- cray-cnl7-haswell / intel@19.1.2.254 -------------------------
oce@0.18.3~X11+tbb
balay@cori08:~/spack> 

So oce is installed [with the same version/variants] - but this include file is missing on cori - don't know why.

Will try dealii~oce

balay commented 4 years ago

dealii~oce did not work.

  >> 2750    /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: cannot find -lAtpSigHandler
  >> 2751    /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: cannot find -lAtpSigHCommData

spack-build-out.txt

bangerth commented 4 years ago

@jppelteret @luca-heltai Any idea what might be going wrong? Is this an issue with the version number of OCE that spack installs?

@balay The AtpSigHandler is a library that the CRAY toolchain automatically adds to the linker line. We had reports of this kind before, and it usually implied that the system was not configured right -- for example, if $MPICXX refers to the CRAY compiler wrappers, but the CRAY libraries aren't loaded by a module. I have no idea what the issue is here, but am afraid that this will require sysadmin support. Other examples of this error are here: https://github.com/spack/spack/issues/7936 https://gitlab.kitware.com/cmake/cmake/-/issues/17413

jppelteret commented 4 years ago

Any idea what might be going wrong? Is this an issue with the version number of OCE that spack installs?

Gosh, I'm not quite sure. I've managed to build oce@0.18.3%gcc@9.3.0~X11+tbb arch=linux-ubuntu20.04-ivybridge, so it seems to be fine in general?

balay commented 3 years ago

The following gets dealii built on cori

./bin/spack install xsdk ^petsc+batch ^boost@1.70.0 ^dealii~oce cflags=-L/opt/cray/pe/atp/2.1.3/libApp cxxflags=-L/opt/cray/pe/atp/2.1.3/libApp