nwchemgit / nwchem-dockerfiles

files to create Docker containers
6 stars 10 forks source link

Program received signal SIGILL: Illegal instruction. #11

Closed miroi closed 3 years ago

miroi commented 3 years ago

Hello, on one cluster, I got this error with nwchem-in-container:

Singularity> /usr/local/bin/mpirun -np 2 /opt/nwchem-7.0.2/bin/LINUX64/nwchem h2o_scf_6-31g.nw
 argument  1 = h2o_scf_6-31g.nw

             Northwest Computational Chemistry Package (NWChem) 7.0.2
             --------------------------------------------------------

                    Environmental Molecular Sciences Laboratory
                       Pacific Northwest National Laboratory
                                Richland, WA 99352

                              Copyright (c) 1994-2020
                       Pacific Northwest National Laboratory
                            Battelle Memorial Institute

             NWChem is an open-source computational chemistry package
                        distributed under the terms of the
                      Educational Community License (ECL) 2.0
             A copy of the license is included with this distribution
                              in the LICENSE.TXT file

                                  ACKNOWLEDGMENT
                                  --------------

            This software and its documentation were developed at the
            EMSL at Pacific Northwest National Laboratory, a multiprogram
            national laboratory, operated for the U.S. Department of Energy
            by Battelle under Contract Number DE-AC05-76RL01830. Support
            for this work was provided by the Department of Energy Office
            of Biological and Environmental Research, Office of Basic
            Energy Sciences, and the Office of Advanced Scientific Computing.

           Job information
           ---------------

    hostname        = login1
    program         = /opt/nwchem-7.0.2/bin/LINUX64/nwchem
    date            = Sat Aug 28 20:15:20 2021

    compiled        = Tue_Apr_06_20:54:26_2021
    source          = /opt/nwchem-7.0.2
    nwchem branch   = 7.0.2
    nwchem revision = b9985dfa
    ga revision     = 5.7.2
    use scalapack   = T
    input           = h2o_scf_6-31g.nw
    prefix          = h2o.
    data base       = ./h2o.db
    status          = startup
    nproc           =        1
    time left       =     -1s

           Memory information
           ------------------

    heap     =   13107200 doubles =    100.0 Mbytes
    stack    =   13107197 doubles =    100.0 Mbytes
    global   =   26214400 doubles =    200.0 Mbytes (distinct from heap & stack)
    total    =   52428797 doubles =    400.0 Mbytes
    verify   = yes
    hardfail = no 

           Directory information
           ---------------------

  0 permanent = .
  0 scratch   = .

                                NWChem Input Module
                                -------------------

                             Water in 6-31g basis set
                             ------------------------
 C2V symmetry detected

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:
#0  0x7fcaaa3a9d01 in ???
#1  0x7fcaaa3a8ed5 in ???
#2  0x7fcaa9b0a20f in ???
#3  0x558be17d85ef in ???

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 23122 RUNNING AT login1
=   EXIT CODE: 4
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Illegal instruction (signal 4)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Singularity> 
Singularity> mpirun --version
HYDRA build details:
    Version:                                 3.3
    Release Date:                            Wed Nov 21 11:32:40 CST 2018
    CC:                              gcc    
    CXX:                             g++    
    F77:                             gfortran   
    F90:                             gfortran   
    Configure options:                       '--disable-option-checking' '--prefix=NONE' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -I/mpich-3.3/src/mpl/include -I/mpich-3.3/src/mpl/include -I/mpich-3.3/src/openpa/src -I/mpich-3.3/src/openpa/src -D_REENTRANT -I/mpich-3.3/src/mpi/romio/include' 'MPLLIBNAME=mpl'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Checkpointing libraries available:       
    Demux engines available:                 poll select
Singularity> 
edoapra commented 3 years ago

Is this with docker? If the answer is yes, what docker image are you using? SIGILL means that the NWChem installation was optimized for computer instructions not present on the host computer you are using.

miroi commented 3 years ago

Hi, this is via singularity, which can pull containers from docker hub. The processor is Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz.

Here is the list of libraries nwchem-in-container is using...How to find out which one is causing problems ?

ilias@login1.kelinux.saske.sk:~/work/qch/projects/open-collection/computer_science/containers/kelinux_saske_sk/singularity/nwchem/.singularity pull $SINGULARITY_CONTAINERS/nwchem.sif docker://nwchemorg/nwchem-702.mpipr.nersc
.
.
2021/08/28 18:07:53  info unpack layer: sha256:cadb02169db62b58185ae002dac457bd0e7ebdf79468ef6c5a13db393b5e87a8
INFO:    Creating SIF file...

ilias@login1.kelinux.saske.sk:~/work/qch/projects/open-collection/computer_science/containers/kelinux_saske_sk/singularity/nwchem/.singularity exec  $SINGULARITY_CONTAINERS/nwchem.sif /bin/sh
Singularity> ldd /opt/nwchem-7.0.2/bin/LINUX64/nwchem
        linux-vdso.so.1 (0x00007fff03375000)
        libmpifort.so.12 => /usr/local/lib/libmpifort.so.12 (0x00007f0a84eb5000)
        libmpi.so.12 => /usr/local/lib/libmpi.so.12 (0x00007f0a84b5e000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0a84b53000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0a84b30000)
        libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f0a84868000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0a84719000)
        libmvec.so.1 => /lib/x86_64-linux-gnu/libmvec.so.1 (0x00007f0a846eb000)
        libpython3.8.so.1.0 => /lib/x86_64-linux-gnu/libpython3.8.so.1.0 (0x00007f0a84195000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0a83fa3000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0a83f88000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0a941e2000)
        libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f0a83f3e000)
        libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f0a83f0e000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f0a83ef2000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0a83eec000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f0a83ee7000)
Singularity> 
Singularity> /usr/local/bin/mpirun -np 2 /opt/nwchem-7.0.2/bin/LINUX64/nwchem h2o_scf_6-31g.nw
 argument  1 = h2o_scf_6-31g.nw
.
.
 C2V symmetry detected

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:
#0  0x7f3f30716d01 in ???
#1  0x7f3f30715ed5 in ???
#2  0x7f3f2fe7720f in ???
#3  0x55576496b5ef in ???

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 20524 RUNNING AT login1
=   EXIT CODE: 4
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Illegal instruction (signal 4)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Singularity> 
miroi commented 3 years ago

Ah, @edoapra , the " BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES" message is connected with Intel compilers.

Looking into containers /usr/local/bin folder, I see

Singularity> mpifort --version
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.

Maybe something is missing from Intel libraries in the container ?

edoapra commented 3 years ago

~If you don't get me the details of the docker image you are using, there is nothing I can do about it and I will close this issue.~ ~To be more precise:~ ~ did you create the docker image yourself? What Dockerfile did you use?~ ~ did you pull a docker image from hub.docker.com? What image did you pull~

I managed to understand what you have been doing by scrolling the output.

miroi commented 3 years ago

Hi, yes, I did pull your docker image via "singularity pull $SINGULARITY_CONTAINERS/nwchem.sif docker://nwchemorg/nwchem-702.mpipr.nersc" and was using only this image, no other.

edoapra commented 3 years ago

As described in https://nwchemgit.github.io/Containers.html Singularity images of NWChem are available https://cloud.sylabs.io/library/edoapra Please keep in mind that they have been optimized for certain CPU instructions

miroi commented 3 years ago

Thanks, closing the ticket.

miroi commented 2 years ago

Just checked containers with NWChem in https://cloud.sylabs.io/library/edoapra , some are running on my machine(s). Thanks !

edoapra commented 2 years ago

Thanks for the feedback