ufs-community / ufs-weather-model

UFS Weather Model
Other
136 stars 244 forks source link

dumpfields=true fails in module_fcst_grid_comp.F90 for coupled model #1765

Closed DeniseWorthen closed 2 months ago

DeniseWorthen commented 1 year ago

Description

The nems.configure variable dumpfields=true should allow the coupling fields to be written from the component itself. This feature was previously working in the fv3_cap, but now fails with the following error:

20230522 132033.780 ERROR            PET000 ESMCI_IO.C:542 ESMCI::IO::write() Operation not yet supported  - tile count of 6 != 1 - not supported yet
20230522 132033.791 ERROR            PET000 ESMCI_IO.C:942 ESMCI::IO::close() Unable to close file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMCI_IO.C:482 ESMCI::IO::write() Operation not yet supported  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMCI_ArrayBundle.C:493 ESMCI::ArrayBundle::write() Unable to write to file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMCI_ArrayBundle_F.C:436 c_esmc_arraybundlewrite() Unable to write to file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMF_ArrayBundle.F90:3976 ESMF_ArrayBundleWrite() Unable to write to file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 module_fcst_grid_comp.F90:1638 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 module_fcst_grid_comp.F90:298 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 module_fcst_grid_comp.F90:865 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 fv3_cap.F90:396 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 ATM:src/addon/NUOPC/src/NUOPC_ModelBase.F90:700 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2577 Unable to write to file  - Phase 'IPDvXp01' Initialize for modelComp 2: ATM did not return ESMF_SUCCESS
20230522 132033.791 ERROR            PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1286 Unable to write to file  - Passing error in return code
20230522 132033.792 ERROR            PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:457 Unable to write to file  - Passing error in return code
20230522 132033.792 ERROR            PET000 UFS.F90:386 Unable to write to file  - Aborting UFS

CMEPS has the capability in the mediator history files to write the fields it receives but it is sometimes useful to confirm that the fields imported by CMEPS are identical to those exported by FV3. This is no longer possible.

To Reproduce:

Run any of the global coupled configurations using the RT system. Use the run directory and set DumpFields = true in the ATM configuration attributes in nems.configure.

Additional context

Output

uturuncoglu commented 8 months ago

@DeniseWorthen Is this also failing with ESMF 8.5.0. @billsacks know that part of code better since he implemented multi-tile I/O support and could have some idea.

billsacks commented 8 months ago

This should work with recent versions of ESMF - or at least, recent versions of ESMF shouldn't give this particular error. Multi-tile I/O support was introduced in ESMF 8.4.0 with some limitations; some of these limitations were addressed in 8.5.0 and additional limitations were addressed in 8.6.0.

uturuncoglu commented 8 months ago

@billsacks Thanks. That is really helpful. @DeniseWorthen It would be nice to test this again when new spack-stack (1.6.0, https://github.com/ufs-community/ufs-weather-model/issues/2036) is available with ESMF 8.6.0. If we still issue, we could try to fix it. Anyway, let me know what you think?

DeniseWorthen commented 8 months ago

Thanks. My understanding was the way we were doing multi-tile output needed to be re-worked, so that the State_RWFields_tiles would either not be used, or would be re-factored now that the multi-tile output I/O was enabled.

junwang-noaa commented 8 months ago

@DeniseWorthen May I ask if further code updates are required with ESMF 8.6.0?

DeniseWorthen commented 8 months ago

@junwang-noaa No, I think we need updates on the FV3 side that tries to use the multi-tiled IO.

DusanJovic-NOAA commented 8 months ago

I ran the cpld_control_p8 test with dumpfields set to true, and I see the following error:

20240131 183007.780 ERROR            PET000 ESMCI_IO_Handler.C:550 ESMCI::IO_Handler::getFilename() Wrong data value  - For multi-tile IO, the specified file name must have exactly one occurrence of '*', which will be replaced by the tile number. Filename <diagnostic_FV3_fcstGrid1.nc> has 0 occurrences.                                                                          
20240131 183007.781 ERROR            PET000 ESMCI_PIO_Handler.C:1323 ESMCI::PIO_Handler::openOneTileF Wrong data value  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_IO_Handler.C:744 ESMCI::IO_Handler::open() Wrong data value  - - Error opening file
20240131 183007.781 ERROR            PET000 ESMCI_IO.C:825 ESMCI::IO::open() Wrong data value  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_IO.C:469 ESMCI::IO::write() Wrong data value  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_ArrayBundle.C:496 ESMCI::ArrayBundle::write() Unable to write to file  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_ArrayBundle_F.C:445 c_esmc_arraybundlewrite() Unable to write to file  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMF_ArrayBundle.F90:3957 ESMF_ArrayBundleWrite() Unable to write to file  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 module_fcst_grid_comp.F90:1596 Unable to write to file  - Passing error in return code
20240131 183007.781 ERROR            PET000 module_fcst_grid_comp.F90:298 Unable to write to file  - Passing error in return code
20240131 183007.781 ERROR            PET000 module_fcst_grid_comp.F90:865 Unable to write to file  - Passing error in return code
20240131 183007.781 ERROR            PET000 fv3_cap.F90:397 Unable to write to file  - Passing error in return code

I added a single '*' to the file name and the code passed that point in module_fcst_grid_comp.F90 but is now crashing with the following error in module_cap_cpl.F90:

20240131 184220.046 ERROR            PET000 ESMFIO.F90:515 ESMFIO_FieldAccess() Operation not yet supported  - Only 2D fields are supported.                                                 
20240131 184220.047 ERROR            PET000 ESMFIO.F90:369 ESMFIO_Write() Operation not yet supported  - Internal subroutine call returned Error
20240131 184220.047 ERROR            PET000 module_cap_cpl.F90:155 Operation not yet supported  - Passing error in return code
20240131 184220.047 ERROR            PET000 module_cap_cpl.F90:59 Operation not yet supported  - Passing error in return code

where is ESMFIO_Write defined? Is it ESMF API, I can not fined the description in ESMF documentation.

DeniseWorthen commented 8 months ago

@DusanJovic-NOAA I can see it in the esmf code here src/Superstructure/IOAPI/interface/ESMFIO.F90

uturuncoglu commented 8 months ago

@DusanJovic-NOAA I don't have all the details but as I know those calls only used internally by ESMF. So, they are not exposed to user. They are internally called when you cal FieldWrite etc. (any call to write Field and Fieldbundle that is exposed to user), ESMF creates the ESMF I/O object to use PIO capability. Anyway, @billsacks could add more in here since he extended multi-tile I/O support in the ESMF side.

DusanJovic-NOAA commented 8 months ago

Okay, thanks. Let me try to use FieldWrite instead.

uturuncoglu commented 8 months ago

@DusanJovic-NOAA As I know Array write calls also use same underlying I/O infrastructure. So, If ArrayWrite etc. is failing there could be a bug in the ESMF side. There could be some limitations in writing the fields in ESMF side that I don't know. Again, @billsacks might have more information. Please open a support ticket if you think that this is bug.

billsacks commented 8 months ago

Interesting. This was a learning experience for me. It looks like ESMFIO.F90 is an entirely different, undocumented I/O interface that Raffaele Montuoro wrote in 2018 to enable I/O of multi-tile Fields. Unlike most ESMF I/O, this does not go through PIO, but instead calls into netcdf directly. I haven't read through it carefully, but my guess is that the functionality of this module may now be superseded by the multi-tile I/O work I did for FieldWrite, FieldBundleWrite, etc.

DusanJovic-NOAA commented 7 months ago

I tried to convert the diagnose_cplFields routine (actually State_RWFields_tiles that is called by diagnose_cplFields) to use FieldBundleWrite instead of ESMFIO_Write, and now I see the following error:

20240209 191247.823 ERROR            PET000 ESMCI_IO.C:1201 ESMCI::IO::redist_arraycreate1de Operation not yet supported  - Tile count != 1 is not supported
20240209 191247.823 ERROR            PET000 ESMCI_IO.C:591 ESMCI::IO::write() Operation not yet supported  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMCI_IO.C:494 ESMCI::IO::write() Operation not yet supported  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMCI_IO_F.C:171 c_esmc_iowrite() Unable to write to file  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMF_IO.F90:523 ESMF_IOAddArray() Unable to write to file  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMF_FieldBundle.F90:18015 ESMF_FieldBundleWrite() Unable to write to file  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 module_cap_cpl.F90:160 Unable to write to file  - Passing error in return code
20240209 191247.827 ERROR            PET000 module_cap_cpl.F90:62 Unable to write to file  - Passing error in return code                                                                    
billsacks commented 7 months ago

It looks like you are running into a limitation with multi-tile I/O that was removed in the ESMF 8.6.0 release: prior to 8.6.0, multi-tile I/O only worked on Arrays / Fields with 1 DE per PET. Based on this error, it seems like you have Fields that use a decomposition with multiple DEs per PET (or possibly 0 DEs per PET). Solutions to this would be to either update to the 8.6.0 release (or the soon-upcoming 8.6.1 release that will have some other patches wanted by the UFS) or, if feasible, change the decomposition of these fields to always use the default of 1 DE per PET.

DusanJovic-NOAA commented 7 months ago

It looks like you are running into a limitation with multi-tile I/O that was removed in the ESMF 8.6.0 release: prior to 8.6.0, multi-tile I/O only worked on Arrays / Fields with 1 DE per PET. Based on this error, it seems like you have Fields that use a decomposition with multiple DEs per PET (or possibly 0 DEs per PET). Solutions to this would be to either update to the 8.6.0 release (or the soon-upcoming 8.6.1 release that will have some other patches wanted by the UFS) or, if feasible, change the decomposition of these fields to always use the default of 1 DE per PET.

Thanks @billsacks I'll try to build the model with ESMF 8.6.0

DusanJovic-NOAA commented 7 months ago

I updated diagnose_cplFields routine in FV3 to use ESMF_FieldBundleWrite. I can now write the coupling fields on 6-tiles with ESMF v8.6.0. Code is this branch:

https://github.com/DusanJovic-NOAA/fv3atm/tree/dump_cpl_fields

It can be tested using the corresponding ufs-weather-model branch:

https://github.com/DusanJovic-NOAA/ufs-weather-model/tree/dump_cpl_fields

I temporarily updated the esmf to version 8.6.0 from spack-stack 1.6.0 on Hera.

BrianCurtis-NOAA commented 7 months ago

When this makes it to PR form, make sure to add a dependency on ESMF 8.6.0

junwang-noaa commented 3 months ago

@DusanJovic-NOAA now we have ESMF 8.6.0 on all platforms, would you please test again to see if this issue is resolved? Thanks.