schism-dev / schism-esmf

Earth System Modeling Framework cap for SCHISM
5 stars 6 forks source link

Decomposition in mesh leaves "holes" in simulation #3

Closed platipodium closed 2 years ago

platipodium commented 2 years ago

We obtain zero-valued spots when interpolating with bilinear onto schism domain. (here from Coastalapp atmesh). The result is good for nearest neighbour interpolation

image001

image001

platipodium commented 2 years ago

My guess is that because bilinear should take target (schism) grid points into consideration when mapping from source (atmesh) that at the edge of domains we are missing the (ghost) points that would be needed for interpolation.

So if we again added that ghost cells to the grid we might be able to do bilinear. …

josephzhang8 commented 2 years ago

There are up to two issues here. I'm 100% sure the bnd issue is not related to domain decomp b/c using 1 core also had same problem.

For the interior issue, I added halo exchange on SCHISM side but that didn't fix the issue.

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Carsten Lemmen @.> Sent: Thursday, February 17, 2022 3:03 AM To: schism-dev/schism-esmf @.> Cc: Subscribed @.***> Subject: Re: [schism-dev/schism-esmf] bilinear interpolation does not work in decomposed target domain, but nearest does (Issue #3)

[EXTERNAL to VIMS received message]

My guess is that because bilinear should take target (schism) grid points into consideration when mapping from source (atmesh) that at the edge of domains we are missing the (ghost) points that would be needed for interpolation.

So if we again added that ghost cells to the grid we might be able to do bilinear. ...

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism-esmf%2Fissues%2F3%23issuecomment-1042671503&data=04%7C01%7Cyjzhang%40vims.edu%7C852e842067a044ae7d2c08d9f1ebf06d%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637806817924687590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7FZTeTK1wpfMTqR9%2F0Gpj1Sd8XztwT64sMn8p8bot10%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZ6LAFTQ3LKEHDMULVDU3STT5ANCNFSM5OT4SUMQ&data=04%7C01%7Cyjzhang%40vims.edu%7C852e842067a044ae7d2c08d9f1ebf06d%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637806817924687590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=L1dPaZKKB4DBT57JiDdALTyrVf%2B1WjiQaSqe13Sk5nQ%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cyjzhang%40vims.edu%7C852e842067a044ae7d2c08d9f1ebf06d%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637806817924687590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8WSzqc3IFMPVg68AavA1gnomVginBYxWDKRwwp0kmHg%3D&reserved=0 or Androidhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cyjzhang%40vims.edu%7C852e842067a044ae7d2c08d9f1ebf06d%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637806817924687590%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=IXHkMGpa2cpptPmHgeFK1ILaUETzGhbdxOMINuAGEvM%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

platipodium commented 2 years ago

Just to record a patch that was sent by Bob Oehmke to identify a possible mesh ownership problem:

I think is would be good to get a bit more information about what’s going on. I’ve attached a version of the ESMCI_WMat.C that writes some debug output when this error occurs. Would you mind putting it into /src/Infrastructure/Mesh/src/Regridding and recompile and rerun. The output should show up in standard out with the prefix: DEBUG

ESMCI_WMat.c.gz

platipodium commented 2 years ago

Moving email conversation to here:

Domain decomposition for 2 PET for SCHISM:

image001

I think the holes are not likely due to unmapped nodes, because other interpolation methods did not have the same problem (but we’ll double check). There may be up to 2 issues. The first is near the boundary. You can see from the plot below that even single core had problem with bilinear near the boundary. The meshes from atmos and ocean components are identical in this case so I’m thinking this may be an underflow issue i.e. the code may think one node is slightly outside the other mesh.

The 2nd issue is the internal holes as shown below from multi-core run. Most of those nodes do not seem to be halo or interface nodes. Again I’m suspecting underflow here. Even if ESMF uses real*8 to transport arrays, is it possible that somehow one PET thinks a node is slightly off an element? Changing to nearest_stod fixed both issues.

In matlab one can specify both interp and extrapolation methods to take care of this type of situation. I’m wondering if ESMF allows that?

Bob's answer: " You shouldn’t need a halo region for a mesh. If you create a mesh with pieces on different PETs, but the pieces are connected together via sharing nodes (like I described before), then the bilinear shouldn’t see a gap between the PETs. Is that where you are seeing the unmapped points? "

platipodium commented 2 years ago

We implemented in the coupling config

ATM -> OCN :remapmethod=bilinear:extrapMethod=nearest_stod

Resulting in fixed "holes" in multi-core setting image002

However, in the inlet itself, there are still zeros appearing...

platipodium commented 2 years ago

That’s odd. I could maybe understand the boundary, but for the interior elements we use a tolerance so that we don’t see that kind of issue where points fall in between. If these meshes are identical we have code that first seaches for exactly matching points to ensure that an identity matrix is returned, so that’s doubly strange that you’d get missing points. Let me think about what could be causing that and get back to you.

However, in the meanwhile, you can use an extrapolation method in addition to the interpolation method with ESMF regridding. In the ESMF_FieldRegridStore() call you just specify it using the extrapMethod argument. You can find a list of options starting here: http://earthsystemmodeling.org/docs/nightly/develop/ESMF_refdoc/node5.html#SECTION050121100000000000000. , but probably the easiest to use is just extrapMethod=ESMF_EXTRAPMETHOD_NEAREST_STOD .

platipodium commented 2 years ago

we have tried interp-extrap in NEMS runtime config, and it worked. We used bilinear for interp and nearest_stod for extrap. No more holes inside or near the boundary.

To answer your question: I said the 2 meshes are identical but I was not correct 100%. There are very slight differences in coordinates for whatever reason, and it didn’t help that both are in lon/lat.

platipodium commented 2 years ago

We now store the owned/foreign node index information in an internal state.

type type_InternalStateStruct
    ! Store the number of and indices in the 1:np resident nodes
    integer(ESMF_KIND_I4) :: numOwnedNodes, numForeignNodes
    integer(ESMF_KIND_I4), pointer :: ownedNodeIds(:) => null()
    integer(ESMF_KIND_I4), pointer :: foreignNodeIds(:) => null()
  end type

  type type_InternalState
    type(type_InternalStateStruct), pointer :: wrap
  end type

However, when accessing the created state, we obtain:

20220224 123134.443 ERROR            PET2 ESMCI_FTable.C:579 esmf_gridcompgetinternalstate   Failure  - Internal subroutine call returned Error
20220224 123134.443 ERROR            PET2 schism_nuopc_cap.F90:396 InitializeRealize   Failure  - SCHISM subroutine call returned error
platipodium commented 2 years ago

I reduced the routine addSchismMesh to bare minimum

subroutine addSchismMesh(comp, rc)
  implicit none

  type(ESMF_GridComp), intent(inout)  :: comp
  integer, intent(out) :: rc

  rc = ESMF_RC_ARG_BAD
end subroutine addSchismMesh

with call addSchismMesh(comp, rc) it is not properly executed.

?????????

platipodium commented 2 years ago

Latest code updates using internalState seems to correctly assign owned/foreign nodes (here for Test_QuarterAnnulus).

On 1 core:

20220302 125841.628 WARNING          PET0     schism created mesh with "130 owned nodes and 108 owned elements

So we have 130 nodes and 108 elements

On three cores:

PET0.ESMF_LogFile:20220302 125212.297 WARNING          PET0     schism created mesh from "051 resident nodes and 036 resident elements in SCHISM
PET0.ESMF_LogFile:20220302 125212.297 WARNING          PET0     schism created mesh with "051 owned nodes and 036 owned elements
PET1.ESMF_LogFile:20220302 125212.297 WARNING          PET1     schism created mesh from "052 resident nodes and 037 resident elements in SCHISM
PET1.ESMF_LogFile:20220302 125212.297 WARNING          PET1     schism created mesh with "044 owned nodes and 037 owned elements
PET2.ESMF_LogFile:20220302 125212.297 WARNING          PET2     schism created mesh from "049 resident nodes and 035 resident elements in SCHISM
PET2.ESMF_LogFile:20220302 125212.297 WARNING          PET2     schism created mesh with "035 owned nodes and 035 owned elements

which sums to 130 nodes and 108 elements

and on 7 cores

PET0.ESMF_LogFile:20220302 125704.157 WARNING          PET0     schism created mesh with "025 owned nodes and 015 owned elements
PET1.ESMF_LogFile:20220302 125704.157 WARNING          PET1     schism created mesh with "019 owned nodes and 015 owned elements
PET2.ESMF_LogFile:20220302 125704.157 WARNING          PET2     schism created mesh with "019 owned nodes and 016 owned elements
PET3.ESMF_LogFile:20220302 125704.157 WARNING          PET3     schism created mesh with "017 owned nodes and 016 owned elements
PET4.ESMF_LogFile:20220302 125704.157 WARNING          PET4     schism created mesh with "020 owned nodes and 015 owned elements
PET5.ESMF_LogFile:20220302 125704.157 WARNING          PET5     schism created mesh with "019 owned nodes and 015 owned elements
PET6.ESMF_LogFile:20220302 125704.157 WARNING          PET6     schism created mesh with "011 owned nodes and 016 owned elements

which sums to 130 nodes and 108 elements

josephzhang8 commented 2 years ago

On Mar 2, Carsten and Joseph worked on adding foreign nodes to the Advance stage. The results from multi-core ar now mostly consistent with single core except right at the interface nodes, as shown below.

image

platipodium commented 2 years ago

Adding FieldHalo Exchange does not lead to improvement. Next step is try with np -> npa in addSchismMesh

josephzhang8 commented 2 years ago

Update on March 8, 2022: added ghost nodes but the issue persisted but at least the 0 value at interface nodes is changed.

image

platipodium commented 2 years ago

Comparison between including ghost (up to npa) /excluding ghost (up to np) shows that this seems necessary despite Bob's comment on not needing this:

With ghosts:

Bildschirmfoto 2022-03-10 um 10 55 15

Without ghosts

Bildschirmfoto 2022-03-10 um 10 50 50
platipodium commented 2 years ago

Seems to be fixed by taking out the assignment to foreign nodes.

Bildschirmfoto 2022-03-10 um 11 27 33
josephzhang8 commented 2 years ago

After adding the wind in the same style as Carsten did, the wind results are now identical using different # of cores for SCHISM as shown below.

image