schism-dev / schism

Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)
http://ccrm.vims.edu/schismweb/
Apache License 2.0
78 stars 84 forks source link

Compilation fails, runs segfault on DRKZ/levante #61

Closed platipodium closed 1 year ago

platipodium commented 2 years ago

From Feb 2022, the dkrz has a new HPC system "Levante". This thread describes the efforts to get SCHISM working (compile and complete tests)

platipodium commented 2 years ago

Loading modules.

We can either use an openmpi or an intelmpi toolchain, the modules.

Common modules

module load hdf5
module load netcdf-c
module load netcdf-fortran
module load git
module load intel-oneapi-compilers

Openmpi versus intelmpi

module load intel-oneapi-mpi
module swap intel-oneapi-mpi openmpi
platipodium commented 2 years ago

Here's a first cmake fragment for Levante

###DKRZ Levante

set (SCHISM_EXE_BASENAME pschism_LEVANTE CACHE STRING "Base name (modules and file extension to be added of the executable. If you want a machine name, add it here")

###Relative paths won't work
set(CMAKE_Fortran_COMPILER ifort CACHE PATH "Path to serial Fortran compiler")
set(CMAKE_C_COMPILER icc  CACHE PATH "Path to serial Fortran compiler")
#set(NetCDF_FORTRAN_DIR "$ENV{NETCDF_FORTRAN}"  CACHE PATH "Path to NetCDF Fortran library")
#set(NetCDF_C_DIR "$ENV{NETCDF}" CACHE PATH "Path to NetCDF C library")
set(CMAKE_Fortran_FLAGS_RELEASE "-O2 -mcmodel=medium  -mtune=core-avx2" CACHE STRING "Fortran flags" FORCE)

#Compiler flags for openmpi
#set(CMAKE_Fortran_FLAGS_RELEASE "-O2 -mcmodel=medium -assume byterecl" CACHE STRING "Fortran flags" FORCE)
platipodium commented 2 years ago

I uploaded a makefile fragment Make.defs.levante.openmpi that circumvents the messed-up mamba paths introduced by the (needed) python3 module.

NFCONFIG=$(shell module unload python3; which nf-config; module load python3)
NCCONFIG=$(shell module unload python3; which nc-config; module load python3)
$(info $(ENV) uses as for netCDF C config $(NCCONFIG))
$(info $(ENV) uses as for netCDF Fortran config $(NFCONFIG))

CDFLIBS = $(shell $(NFCONFIG) --flibs) $(shell $(NCCONFIG) --libs)
$(info $(ENV) uses netCDF with CDFLIBS=$(CDFLIBS))
CDFMOD = -I$(shell $(NFCONFIG) --includedir)
$(info $(ENV) uses netCDF with CDFMOD=$(CDFMOD))
josephzhang8 commented 2 years ago

Thx to Carsten's magic, we now have a working toolkit (based on gcc) on Levante! Use

(1) modules.levante in src/Utility/Cluster_files (you can source modules.levante) (2) cmake file is SCHISM.local.levante.gcc (3) batch script is run_levante_ompi in src/Utility/Cluster_files

That's it. I'm still looking at performance for large core counts.

platipodium commented 2 years ago

They installed a new toolchain with intel:

the system administrator has installed the missing libraries on levante. Please try to re-build your application using the following modules:

hdf5/1.12.1-intel-oneapi-mpi-2021.5.0-intel-2021.5.0
netcdf-c/4.8.1-intel-oneapi-mpi-2021.5.0-intel-2021.5.0
netcdf-fortran/4.5.3-intel-oneapi-mpi-2021.5.0-intel-2021.5.0
parallel-netcdf/1.12.2-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

The path to the shared netCDF-Fortran library can be coded in the binary using the following flag:

-Wl,-rpath,/sw/spack-levante/netcdf-fortran-4.5.3-r5r3ev/lib/

josephzhang8 commented 2 years ago

Will do after Levante is back from maintenance. Thx Carsten.

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Carsten Lemmen @.> Sent: Monday, February 28, 2022 10:56 AM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Assign @.> Subject: Re: [schism-dev/schism] Compilation fails, runs segfault on DRKZ/levante (Issue #61)

[EXTERNAL to VIMS received message]

They installed a new toolchain with intel:

the system administrator has installed the missing libraries on levante. Please try to re-build your application using the following modules:

hdf5/1.12.1-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

netcdf-c/4.8.1-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

netcdf-fortran/4.5.3-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

parallel-netcdf/1.12.2-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

The path to the shared netCDF-Fortran library can be coded in the binary using the following flag:

-Wl,-rpath,/sw/spack-levante/netcdf-fortran-4.5.3-r5r3ev/lib/

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F61%23issuecomment-1054400128&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iGgZI90%2BuSt3Q2BbkeqYwfvHFaPGxTsPMnwHWrWhBnI%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZ7PFS2IU6TGRNIFTRLU5OLJRANCNFSM5O2HK3IA&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=j6QjJsRJbcxmh%2FJ6DvIX4Fk6P%2FeuRe4Q7Ad7k0B1MfM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gV%2F8Mj445CbA0zPUG5W4lAzChZxNDu1vwBb97xlp%2B08%3D&reserved=0 or Androidhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=cyGPCu3L3EPtvfLSZTNgsF2Lz182UrqH6mz%2BH7yk2aA%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.**@.>>

josephzhang8 commented 2 years ago

I tried those libs but no success.

So far:

  1. Gcc toolset: working for small cases, but NaN or hanging for large case
  2. Intel: not working, and error suggests some conflicts of MPI versions btw openMPI and netcdf (parallel)

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Carsten Lemmen @.> Sent: Monday, February 28, 2022 10:56 AM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Assign @.> Subject: Re: [schism-dev/schism] Compilation fails, runs segfault on DRKZ/levante (Issue #61)

[EXTERNAL to VIMS received message]

They installed a new toolchain with intel:

the system administrator has installed the missing libraries on levante. Please try to re-build your application using the following modules:

hdf5/1.12.1-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

netcdf-c/4.8.1-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

netcdf-fortran/4.5.3-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

parallel-netcdf/1.12.2-intel-oneapi-mpi-2021.5.0-intel-2021.5.0

The path to the shared netCDF-Fortran library can be coded in the binary using the following flag:

-Wl,-rpath,/sw/spack-levante/netcdf-fortran-4.5.3-r5r3ev/lib/

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F61%23issuecomment-1054400128&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iGgZI90%2BuSt3Q2BbkeqYwfvHFaPGxTsPMnwHWrWhBnI%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZ7PFS2IU6TGRNIFTRLU5OLJRANCNFSM5O2HK3IA&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=j6QjJsRJbcxmh%2FJ6DvIX4Fk6P%2FeuRe4Q7Ad7k0B1MfM%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gV%2F8Mj445CbA0zPUG5W4lAzChZxNDu1vwBb97xlp%2B08%3D&reserved=0 or Androidhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cyjzhang%40vims.edu%7C0618730889ad4b329e2408d9fad2d5dc%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637816605709978187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=cyGPCu3L3EPtvfLSZTNgsF2Lz182UrqH6mz%2BH7yk2aA%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.**@.>>

platipodium commented 2 years ago

I currently get (with gcc toolchain)

/sw/spack-levante/netcdf-fortran-4.5.3-jlxcfz/lib/libnetcdff.so: undefined reference to `_gfortran_os_error_at@GFORTRAN_10'
objdump -t /sw/spack-levante/netcdf-fortran-4.5.3-jlxcfz/lib/libnetcdff.so |grep gfort
0000000000000000       F *UND*  0000000000000000              _gfortran_os_error_at@@GFORTRAN_10
platipodium commented 2 years ago

Solved this by adding the following lines to local CMake config:

# To avoid dynamic loading of wrong standard library, force these to static
set(CMAKE_EXE_LINKER_FLAGS "-static-libgcc -static-libstdc++")

from https://stackoverflow.com/questions/24648357/compiling-a-static-executable-with-cmake

josephzhang8 commented 1 year ago

We have updated the module, cmake and batch script files for Levante. At the moment, both Intel and gcc work well for meshes of any size. Intel is faster.

josephzhang8 commented 1 year ago

Thx Carsten!

-Joseph

Joseph Zhang Office: (804) 684 7466 Web: schism.wiki

From: Carsten Lemmen @.> Sent: Tuesday, April 26, 2022 8:24 AM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Assign @.> Subject: Re: [schism-dev/schism] Compilation fails, runs segfault on DRKZ/levante (Issue #61)

[EXTERNAL to VIMS received message]

Solved this by adding the following lines to local CMake config:

To avoid dynamic loading of wrong standard library, force these to static

set(CMAKE_EXE_LINKER_FLAGS "-static-libgcc -static-libstdc++")

from https://stackoverflow.com/questions/24648357/compiling-a-static-executable-with-cmakehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F24648357%2Fcompiling-a-static-executable-with-cmake&data=05%7C01%7Cyjzhang%40vims.edu%7C959c832b478943d7813a08da277faa5c%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637865726524645764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xkJ9owAIh51azr0JRzfut2HjHjf4lcKmfW9SxsA93SM%3D&reserved=0

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F61%23issuecomment-1109730781&data=05%7C01%7Cyjzhang%40vims.edu%7C959c832b478943d7813a08da277faa5c%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637865726524645764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SE1XOsyLU1qVc51VK0fnb3mbUrP9YlgHKuINhzfBAoA%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZYVZ43YDQODVTVCSPLVG7N6TANCNFSM5O2HK3IA&data=05%7C01%7Cyjzhang%40vims.edu%7C959c832b478943d7813a08da277faa5c%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637865726524645764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zr1%2BPJTfXTQzWnbstoOxpXffwXH76c9avzS%2BRL4dNao%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.**@.>>

josephzhang8 commented 1 year ago

Issues resolved.