ufs-community / ufs-weather-model

UFS Weather Model
Other
134 stars 242 forks source link

MacOS runtime error with ESMF Version v8.3.0b09 + openmpi #1698

Open MicroTed opened 1 year ago

MicroTed commented 1 year ago

Update: The error was with openmpi (4.1.2), but ufs_model seems to work fine with hpc-stack using mpich (3.3.2) instead. I have had trouble with openmpi before with ifort, but it had seemed OK with gfortran/gcc (with other things like CM1 -- I had previously only used mpich with UFS on my intel mac on Catalina).


I seem to be running into a similar error as https://github.com/ufs-community/ufs-weather-model/issues/303 where ufs_model stops right off the bat with this error:

* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . 
     PROGRAM ufs       HAS BEGUN. COMPILED       0.00     ORG: np23   
     STARTING DATE-TIME  APR 04,2023  21:48:21.084   94  TUE   2460039

terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at:  key not found

Specs: iMac Pro 2017/Ventura, gcc-11.3, hpc-stack develop branch (as of Mon Dec 26 15:24:31 2022 -0500, commit d2acee795). No problems with compiling, at least.

It is very possible that I have something set up incorrectly, so I'm only guessing that ESMF is responsible based on the previous issue (303) when compiled with GNU. I am trying to set up SRW at release/public-v2.1.0 to test some CCPP updates for NSSL-MP. The same version on jet (w/ ifort and same ESMF version) is working. (I am trying the SUBCONUS_Ind_3km with the FV3_WoFS_v0 suite, so it is set up with those defaults)

Any ideas out there? (I have an M1 mac, too, but haven't set up the test case yet.) -- Ted Mansell (NOAA/NSSL)

PET0 output: Running with ESMF Version : v8.3.0b09 ESMF library build date/time: "Mar 29 2023" "12:50:56" ESMF library build location : /Users/ted.mansell/src/hpc-stack/pkg/v8.3.0b09 ESMF_COMM : openmpi ESMF_MOAB : enabled ESMF_LAPACK : enabled ESMF_NETCDF : enabled ESMF_PNETCDF : disabled (and that is the end of the file) ESMF_PIO : enabled ESMF_YAMLCPP : enabled

climbfuji commented 7 months ago

Note that this is a duplicate of #1340. Since this here is newer, I will provide updates here. Just tried this again, and it still fails. However, this time round I got more information from the spack-stack modules on my macOS:

> mpirun -np 8 ./fv3.exe 2>&1 | tee fv3.log

* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
     PROGRAM ufs       HAS BEGUN. COMPILED       0.00     ORG: np23
     STARTING DATE-TIME  DEC 27,2023  11:59:59.035  361  WEN   2460306

libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
libc++abi: terminating with uncaught exception of type nlohmann::json_abi_v3_11_2::detail::out_of_range: [json.exception.out_of_range.403] key 'NUOPC' not found

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:

@theurich Is this information helpful for tracking down the ESMF startup crash of the ufs-weather-model?