ufs-community / ufs-weather-model

UFS Weather Model
Other
134 stars 243 forks source link

Using ESMF managed threading in UFS #1111

Closed junwang-noaa closed 1 year ago

junwang-noaa commented 2 years ago

This EPIC is to track the development of using the ESMF managed threading in UFS.

1) Update to ESMF 8.2.0 release. 2) Adding the code updates to enable testing the ESMF managed threading in UFS. 3) Update the intel/impi to Intel 2021 to resolve the mpi_alltoall scaling issue. 4) Enable the ESMF managed threading in UFS, add regression tests to use the capability

Documentation from Gerhard is available here.

DeniseWorthen commented 2 years ago

ESMF managed threading does not require updated ESMF; however failing RTs will require the update (v8.4.0b08) to resolve issue of failing to write correct type kind of global attributes.

DeniseWorthen commented 2 years ago

HAFs will plan on making use of the feature in the near future. Dusan's UPP update for moving nests will allow HAFS to run in-line post with ESMF-threading.

Moorthi has also tested c768 on wcoss2 with esmf-threading coupled. Found that more than one thread in MOM6 does not reproduce. Needs to be tested with in-line post turned on.

Jiande reports that GFDL says that MOM reproduces standalone with threading but doesn't help much in speed. Conflicts with Moorthi's result.

Next reasonable step would be test threading for MOM6 with DATM/CICE6.

Coupled applications team should also be asked to test S2S with ESMF threading so that the non-esmf threading test can be dropped.

DeniseWorthen commented 2 years ago

Moorthi reports that ESMF threading on WCOSS2 with C3072 fails with large number of nodes while same number of nodes requested with traditional threading works. 605 nodes 128 tasks/node.

DeniseWorthen commented 2 years ago

Fails at initiation. Need to bring this up w/ system folks on WCOSS2. Launch MPI job w/ very large number of tasks. Moorthi can create issue for GDIT to look.

DeniseWorthen commented 1 year ago

WCOSS2 is unavailable for testing for ESMF team. Need to work w/ Cray to resolve possibly. Need issue to hardware support team for start solving the issue. C5 is in acceptance testing so may be a possible test machine. George Vandenberge could perhaps look at in on WCOSS2.

DeniseWorthen commented 1 year ago

Need to check w/ coupled group for their testing of esmf-managed threading in s2s (Neil has tested) and prepare to drop non-esmf-managed threaded tests. Uses fewer tasks which is preferable (relative to non-threaded). We can close this EPIC once we have confirmation that EMSF-managed threading is default for coupled apps.

DeniseWorthen commented 1 year ago

Maintenance of framework should be EPIC task but issue is more urgent to have implemented, so EIB can contribute to this task.