Closed btokay closed 5 years ago
If you logged out then you'll need the LD_PRELOAD env var set again. Have you done that?
Yes, I had set LD_PRELOAD previously since I need to check if any error would occur as a result of import underworld and import UWGeodynamics in Python3. Unless LD_PRELOAD is set before run a job, they do not work unfortunately.
Yes that's correct. Set the LD_PRELOAD env. var in your pbs script or prepend it to your execution command.
Hi Julian, Despite of setting LD_PRELOAD env var, I frequently got the same errors. For that reason, I re-installed UWGeodynamics by newer versions of python3 (python3.6.7) and petsc (petsc 3.9.4), but now different errors come about as the followings:
[46]PETSC ERROR: ------------------------------------------------------------------------ [46]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [46]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [46]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [46]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [46]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [46]PETSC ERROR: to get more information on the crash. [46]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [46]PETSC ERROR: Signal received [46]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [46]PETSC ERROR: Petsc Release Version 3.9.4, Sep, 11, 2018 [46]PETSC ERROR: ./3DExtension.py on a named r2358 by bt1752 Sat Sep 7 23:29:00 2019 [46]PETSC ERROR: Configure options --with-shared-libraries=1 --prefix=/apps/petsc/3.9.4 --with-blaslapack-lib="-L/apps/intel-ct/2018.3.222/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread" --with-hypre --download-hypre=yes --with-ptscotch=1 --download-ptscotch=yes --download-ml=yes --download-mumps=yes --with-mumps=1 --with-scalapack=1 --with-scalapack-lib="/apps/intel-ct/2018.3.222/mkl/lib/intel64/libmkl_scalapack_lp64.a /apps/intel-ct/2018.3.222/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.a" --with-debugging=0 --with-mpi=1 --with-parmetis=1 --download-parmetis=yes --with-metis=1 --download-metis=yes --with-superlu=1 --download-superlu=yes --with-superlu_dist=1 --download-superlu_dist=yes --with-necdf=1 --with-hdf5=1 --with-fftw=1 --with-fftw-include=/apps/intel-ct/2018.3.222/mkl/include/fftw --with-fftw-lib="/apps/fftw3-mkl/2018.3.222/lib/libfftw3x_cdft_openmpi3_lp64.a /apps/fftw3-mkl/2018.3.222/lib/libfftw3xc_intel.a" CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 --CFLAGS=-mkl --CXXFLAGS=-mkl --download-ctetgen=yes --download-triangle=yes [46]PETSC ERROR: #1 User provided function() line 0 in unknown file [r2358:12107] [0] func:/apps/openmpi/3.1.3/lib/libopen-pal.so.40(opal_backtrace_buffer+0x35) [0x2afded25aefd] [r2358:12107] [1] func:/apps/openmpi/3.1.3/lib/libmpi.so(ompi_mpi_abort+0xa9) [0x2afdeb8dedf1] [r2358:12107] [2] func:/apps/openmpi/3.1.3/lib/libmpi.so(ompi_errhandler_callback+0x5b) [0x2afdeb8cd2fe] [r2358:12107] [3] func:/apps/openmpi/3.1.3/lib/openmpi/mca_pmix_ext2x.so(+0x503e) [0x2afdf0b0503e] [r2358:12107] [4] func:/apps/libevent/2.1.8/lib/libevent-2.1.so.6(+0x2122b) [0x2afdee08022b] [r2358:12107] [5] func:/apps/libevent/2.1.8/lib/libevent-2.1.so.6(event_base_loop+0x34f) [0x2afdee0809af] [r2358:12107] [6] func:/apps/openmpi/3.1.3/lib/libopen-pal.so.40(+0x324c2) [0x2afded2104c2] [r2358:12107] [7] func:/lib64/libpthread.so.0(+0x7aa1) [0x2afdec0d2aa1] [r2358:12107] [8] func:/lib64/libc.so.6(clone+0x6d) [0x2afdecc63c4d] [47]PETSC ERROR: ------------------------------------------------------------------------ [47]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [47]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [47]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [47]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [47]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [47]PETSC ERROR: to get more information on the crash. [47]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [47]PETSC ERROR: Signal received [47]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [47]PETSC ERROR: Petsc Release Version 3.9.4, Sep, 11, 2018 [47]PETSC ERROR: ./3DExtension.py on a named r2358 by bt1752 Sat Sep 7 23:29:00 2019 [47]PETSC ERROR: Configure options --with-shared-libraries=1 --prefix=/apps/petsc/3.9.4 --with-blaslapack-lib="-L/apps/intel-ct/2018.3.222/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread" --with-hypre --download-hypre=yes --with-ptscotch=1 --download-ptscotch=yes --download-ml=yes --download-mumps=yes --with-mumps=1 --with-scalapack=1 --with-scalapack-lib="/apps/intel-ct/2018.3.222/mkl/lib/intel64/libmkl_scalapack_lp64.a /apps/intel-ct/2018.3.222/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.a" --with-debugging=0 --with-mpi=1 --with-parmetis=1 --download-parmetis=yes --with-metis=1 --download-metis=yes --with-superlu=1 --download-superlu=yes --with-superlu_dist=1 --download-superlu_dist=yes --with-necdf=1 --with-hdf5=1 --with-fftw=1 --with-fftw-include=/apps/intel-ct/2018.3.222/mkl/include/fftw --with-fftw-lib="/apps/fftw3-mkl/2018.3.222/lib/libfftw3x_cdft_openmpi3_lp64.a /apps/fftw3-mkl/2018.3.222/lib/libfftw3xc_intel.a" CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 --CFLAGS=-mkl --CXXFLAGS=-mkl --download-ctetgen=yes --download-triangle=yes [47]PETSC ERROR: #1 User provided function() line 0 in unknown file [r2358:12108] [0] func:/apps/openmpi/3.1.3/lib/libopen-pal.so.40(opal_backtrace_buffer+0x35) [0x2b814a2fcefd] [r2358:12108] [1] func:/apps/openmpi/3.1.3/lib/libmpi.so(ompi_mpi_abort+0xa9) [0x2b8148980df1] [r2358:12108] [2] func:/apps/openmpi/3.1.3/lib/libmpi.so(ompi_errhandler_callback+0x5b) [0x2b814896f2fe] [r2358:12108] [3] func:/apps/openmpi/3.1.3/lib/openmpi/mca_pmix_ext2x.so(+0x503e) [0x2b814dba703e] [r2358:12108] [4] func:/apps/libevent/2.1.8/lib/libevent-2.1.so.6(+0x2122b) [0x2b814b12222b] [r2358:12108] [5] func:/apps/libevent/2.1.8/lib/libevent-2.1.so.6(event_base_loop+0x34f) [0x2b814b1229af] [r2358:12108] [6] func:/apps/openmpi/3.1.3/lib/libopen-pal.so.40(+0x324c2) [0x2b814a2b24c2] [r2358:12108] [7] func:/lib64/libpthread.so.0(+0x7aa1) [0x2b8149174aa1] [r2358:12108] [8] func:/lib64/libc.so.6(clone+0x6d) [0x2b8149d05c4d]
mpiexec noticed that process rank 53 with PID 3542 on node r2393 exited on signal 9 (Killed).
[r1286:31581] 119 more processes have sent help message help-mpi-api.txt / mpi-abort [r1286:31581] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [r1286:31581] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I think no problem related to installation of UWGeodynamics is present. Am I right? If this is the case, what should I do ? ¯_(ツ)_/¯
Cheers,
OK. I will update the installation guide in the documentation. It currently says to use a virtual environment but that causes pb with petsc and the new UW 2.8 installation.
Hi @btokay @julesghub
Have you solved the pb?
Romain
Hi Romain, Apologies for the late reply! Yes, solved! That problem was due to the solver type "mumps". Other solver options should be used to run a UWG job on Raijin. Thanks @jmansour for your help! Bülent
Hi,
I completed the installation of underworld and uwgeodynamics on Raijin and besides
python3 -c "import underworld"
andpython3 -c "import UWGeodynamics"
gave no error. But after logout and then login, to run a UWGeodynamics job caused the following errors:What do I miss? Do you have any suggestions?
Best regards, Bülent