uturuncoglu / RegESM

Regional Earth System Model
MIT License
42 stars 23 forks source link

problem in field time stamp information in three component case (atm+ocn+rtm) #11

Closed uturuncoglu closed 9 years ago

uturuncoglu commented 9 years ago

Ticket: #3613711 (ESMF Support)

ExecuteRouteHandle method of Connector does not working as it is expected. The OCN component is interacting with both ATM (fast time step - every 1 hour) and RTM (slow time step, every day) and OCN component import state includes the all the fields come from both ATM (i.e. net heat flux) and RTM (i.e. river discharge).

When the code passes from ExecuteRouteHandle method for the case of ATM-OCN direction, it updates the time steps for all the fields (both net heat flux and river discharge) in the field bundle using "NUOPC_FieldBundleUpdateTime" call (in NUOPC_Connector.F90, around line 1762). The problem is that if the field (i.e. river discharge) does not belong to the components that are interacting (i.e. ATM-OCN), then the code updates the time stamp of the field as full of zeros and only "net heat flux" will have the correct time stamp information.

Then the second ExecuteRouteHandle method will perform for RTM-OCN direction and it follows the same logic. It updates the time stamp of the fields (stored in the OCN import state) and put zeros to the "net heat flux" (it was correct in the previous step) and correct time stamp information to the "river discharge" (it was wrong in the previous step). So, at the end of this process, only "river discharge" will have the correct time stamp information because if the component interacts with more than one component that last interaction (in this case from RTM to OCN) will overwrites the time stamps. The OCN component CheckImport will fail due to the wrong time stamp information assigned to the "net heat flux" field.

I know that it is hard to explain but this situation only appears in the concurrent type execution and not in sequential one. It could be related with NUOPC_FieldBundleUpdateTime call

   call NUOPC_FieldBundleUpdateTime(genIS%wrap%srcFields, &
                                    genIS%wrap%dstFields, rc=rc)
   if (ESMF_LogFoundError(rcToCheck=rc, msg=ESMF_LOGERR_PASSTHRU,    &
       line=__LINE__, file=FILENAME)) return

because it updates the destination field's time stamp information only for the PETs that exports the data and rest of the PETs will be zeros.

It seems that there is a "race condition" for the "time stamp" attribute of the exchange field. The possible solution is to add an extra code to fix the overwritten time stamp information after " ExecuteRouteHandle" execution but i am not sure. I might have a design problem and maybe there is no need to store all fields in the import state but i am not sure. Anyway, if you have any suggestion just let me know.

PS: ESMF version is esmf-6.3.0rp1.

uturuncoglu commented 9 years ago

Response from Gerhard:

You are describing the issue very well, and I think I completely understand the symptoms you are seeing. I am a bit puzzled right now as to why the FieldBundle on the export side of the ATM->OCN and RTM->OCN Connectors would contain Fields that are not part of the particular connection. Even if the OCN importState contains a superset of Fields, each Connector instance should only interact with the Fields that it is connecting to. I wonder if there was a bug in this back in 630rp1, or if this there is still something strange going on that we are not aware of. I will construct a test case and look into it.

uturuncoglu commented 9 years ago

Response from Gerhard:

I just added a v7 version of the AtmOcnRtmTwoTimescalesProto. Following the nomenclature to indicate v7 upgraded protos, it is called

v7-AtmOcnRtmTwoTimescalesProto

You can access all of the current protos at http://svn.code.sf.net/p/esmfcontrib/svn/NUOPC/trunk/

You will see that I modified this version of the prototype code to run the ATM, OCN, and RTM components on different petLists. There are 2 fields being transferred ATM->OCN, and 1 field RTM->OCN. I see the expected behavior during the time stepping: the one field that comes through RTM->OCN only moves in time on the slower outer loop, but stays at the outer loop time while the inner faster loop runs. During the inner faster loop the ATM->OCN fields move forward as expected.

Please run the new proto with a v7 ESMF snapshot (I recommend using ESMF_7_0_0_beta_snapshot_38) and see if you find the same. Maybe I still don't have all the details right that you were describing?

uturuncoglu commented 9 years ago

It is tested and working without any problem. So, code issue #11