ufs-community / ufs-weather-model

UFS Weather Model
Other
130 stars 238 forks source link

Bring exchange grid support to UFS #1157

Closed uturuncoglu closed 2 years ago

uturuncoglu commented 2 years ago

PR Checklist

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

The PR aims to bring exchange grid capability, which is provided by CMEPS mediator, to the UFS weather model. The PR includes following major changes in the model;

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository. (Remember, issues must always be created before starting work on a PR branch!)

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

The model is tested on NCAR's Cheyenne platform with intel compiler. All regression tests are passing at this point.

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first? If so add the "waiting for other repos" label and list the upstream PRs

uturuncoglu commented 2 years ago

I updated the model and sub-modules to the recent versions and try to address the issues. Let me know if you need any other changes to move forward this PR.

uturuncoglu commented 2 years ago

@junwang-noaa This PR is ready for review. I also run full test suite again and all of them passing on Cheyyene.

junwang-noaa commented 2 years ago

ORT test issues: 1) debug test is fixed. 2). Still working on threading test. 3) next is the restart test.

uturuncoglu commented 2 years ago

@junwang-noaa I think we figure out the issue with the ORT threading test. After careful examination, I tracked it to the remapping in CMEPS aoflux code that involves exchange grid. The roundoff difference in the remap call (due to ordering of operations under different PET configurations and ordering or corner points in ESMF Mesh) is causing answer change in the model since threading test uses less processor on mediator and FV3 components and the mapping is responsible to transfer data back and forth between component grids and exchange grid. This changes the forcing and slightly modified forcing goes to CCPP and causes change in the answer. So, at this point using different number of core in mediator and FV3 prevents to reproduce the results in ORT threading test when exchange grid involves. If I calculate the fluxes on agrid or ogrid the CCPP reproduces the results even with the different number of PETs assigned to mediator and FV3. So, this is specific to exchange grid (xgrid).

If I update the ORT threading test for cpld_control_c96_noaero_p8_xgrid and provide same number of processor to mediator and FV3 as baseline (*_std_base) then the ORT threading test is passing also for exchange grid with OMP_NUM_THREADS=2.

So, there is nothing needs to be done in CCPP side or CMEPS CCPP host model at this point. I also deactivate the reading initial condition in the CCPP host model and that fixed the issue with other ORT tests (dcp and fez tests). We could look at initialization later. So, at this point all ORT tests pass (with the mods that I did for the threading). I also run full RT suite on Cheyenne and all tests passes except xgrid one which has no baseline yet. Anyway, I think that PR is ready to merge in after review from side.

climbfuji commented 2 years ago

Wow, great detective work, Ufuk!

On May 26, 2022, at 9:41 AM, Ufuk Turunçoğlu @.***> wrote:

@junwang-noaa https://github.com/junwang-noaa I think we figure out the issue with the ORT threading test. After careful examination, I tracked it to the remapping in CMEPS aoflux code that involves exchange grid. The roundoff difference in the remap call (due to ordering of operations under different PET configurations and ordering or corner points in ESMF Mesh) is causing answer change in the model since threading test uses less processor on mediator and FV3 components and the mapping is responsible to transfer data back and forth between component grids and exchange grid. This changes the forcing and slightly modified forcing goes to CCPP and causes change in the answer. So, at this point using different number of core in mediator and FV3 prevents to reproduce the results in ORT threading test when exchange grid involves. If I calculate the fluxes on agrid or ogrid the CCPP reproduces the results even with the different number of PETs assigned to mediator and FV3. So, this is specific to exchange grid (xgrid).

If I update the ORT threading test for cpld_control_c96_noaero_p8_xgrid and provide same number of processor to mediator and FV3 as baseline (*_std_base) then the ORT threading test is passing also for exchange grid with OMP_NUM_THREADS=2.

So, there is nothing needs to be done in CCPP side or CMEPS CCPP host model at this point. I also deactivate the reading initial condition in the CCPP host model and that fixed the issue with other ORT tests (dcp and fez tests). We could look at initialization later. So, at this point all ORT tests pass (with the mods that I did for the threading). I also run full RT suite on Cheyenne and all tests passes except xgrid one which has no baseline yet. Anyway, I think that PR is ready to merge in after review from side.

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/1157#issuecomment-1138715067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RJBJWSVY4SCZAVNLODVL6LSZANCNFSM5SUEAEIQ. You are receiving this because you were mentioned.

uturuncoglu commented 2 years ago

@climbfuji Thanks. I think ORT tests bring extra complexity to the PRs but it is good to have them since we fixed lots of things in the PR with the help of them.

uturuncoglu commented 2 years ago

@junwang-noaa the code is updated and also I updated the PR description to reflect newly added options to enable I/O capability such as reading and writing restart files.

uturuncoglu commented 2 years ago

@junwang-noaa i will change the aoflux_grid = 'xgrid' to aoflux_grid = 'agrid' in the exchange grid RT. I have already tested on Orion and with agrid we could reproduce the results. Do you also want to change the name of the RT, since it has xgrid suffix?

junwang-noaa commented 2 years ago

Yes, please. Thanks

On Fri, Jun 3, 2022 at 12:36 PM Ufuk Turunçoğlu @.***> wrote:

@junwang-noaa https://github.com/junwang-noaa i will change the aoflux_grid = 'xgrid' to aoflux_grid = 'agrid' in the exchange grid RT. I have already tested on Orion and with agrid we could reproduce the results. Do you also want to change the name of the RT, since it has xgrid suffix?

— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/1157#issuecomment-1146160639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TNLZTXYRNZWFF63BALVNIX67ANCNFSM5SUEAEIQ . You are receiving this because you were mentioned.Message ID: @.***>

uturuncoglu commented 2 years ago

@junwang-noaa I updated the RT test to use agrid. I also rename the test itself. I also run the test and confirm it is working except failing with the missing baseline. So, let me know if you need any other change.

jkbk2004 commented 2 years ago

@uturuncoglu A PR is put on hold in the commit ques. It allows us to work on this PR, I guess, Scope of change is a bit big but no baseline change. I think we can start testing RTs. @junwang-noaa Without further requirement of the ORT on this PR, we can submit the auto-RTs after sync to the head of develop branch.

uturuncoglu commented 2 years ago

@jkbk2004 That is great. If you need anything from my side just let me know. Thanks

jkbk2004 commented 2 years ago

@jkbk2004 That is great. If you need anything from my side just let me know. Thanks

@uturuncoglu Ok! can you sync your branches up to the head of develop branches: ufs-wm and fv3atm?

uturuncoglu commented 2 years ago

@jkbk2004 sure. I am working on it.

uturuncoglu commented 2 years ago

@jkbk2004 it seems that the name of cpld_control_c96_noaero_p8 is changed to cpld_control_noaero_p8. Since I was referencing cpld_control_c96_noaero_p8 in new RT which is named cpld_control_c96_noaero_p8_agrid, it think it would be nice to adjust also its name. Let me know what do you think?

uturuncoglu commented 2 years ago

@jkbk2004 I updated the name of the test and testing RT again before pushing changes to be sure that update does not break anything.

uturuncoglu commented 2 years ago

@jkbk2004 It is done.

github-actions[bot] commented 2 years ago

@uturuncoglu please bring these up to date with respective authoritative repositories

github-actions[bot] commented 2 years ago

@uturuncoglu please bring these up to date with respective authoritative repositories

BrianCurtis-NOAA commented 2 years ago

Automated RT Failure Notification Machine: cheyenne Compiler: gnu Job: RT [RT] Repo location: /glade/scratch/epicufsrt/autort/tests/auto/pr/900766075/20220606171513/ufs-weather-model Please make changes and add the following label back: cheyenne-gnu-RT

jkbk2004 commented 2 years ago

@uturuncoglu emc/develop cmeps branch is behind master to escomp. git action repo check failure seems to be due to that. Let me talk to emc side.

junwang-noaa commented 2 years ago

@uturuncoglu In ufs-weather-model, all the changes need to be made to repositories/branches listed in the .gitsubmodule. the CMEPS is currently pointing to the noaa-emc fork emc/develop branch:

https://github.com/ufs-community/ufs-weather-model/blob/develop/.gitmodules#L19

However your cmeps PR is to commit changes from your branch to ESCOMP master, which is different from the latest noaa-emc emc/develop branch used in ufs-weather-model. If you want to commit your branch to ESCOMP master, please work with ESCOMP code managers to commit your CMEPS first, then make a CMEPS PR to bring the ESCOMP master to noaa-emc emc/develop branch, and link the PR in this ufs-weather-model PR. Otherwise, the CI will fail as your CMEPS is not based on current emc/develop branch. On the other hand, you can have a PR submitted to the noaa-emc emc/develop branch and we commit it along with other subcomponents PRs in this ufs-weather-model PR. After that you make a separate PR to bring changes from emc/develop to ESCOMP master branch as Denise has been doing. Please let us know which way you'd like to do. Currently since CI test failed with revision check, we can't continue with RT.

uturuncoglu commented 2 years ago

@jkbk2004 I'll soon merge CMEPS and create another PR to update NOAA EMC fork. Thanks for your help

jkbk2004 commented 2 years ago

@jkbk2004 I'll soon merge CMEPS and create another PR to update NOAA EMC fork. Thanks for your help

@uturuncoglu Sure!

uturuncoglu commented 2 years ago

@jkbk2004 I merge the PR in ESCOMP. The new PR in NOAA/EMC side is in here https://github.com/NOAA-EMC/CMEPS/pull/66

uturuncoglu commented 2 years ago

@jkbk2004 i think I need to use ESCOMP master in my UFS Pr. Right? I am not too familiar to the process. Sorry

jkbk2004 commented 2 years ago

@jkbk2004 i think I need to use ESCOMP master in my UFS Pr. Right? I am not too familiar to the process. Sorry

@uturuncoglu I add reviewers. For simplicity, if we can update emc/develop cmeps to the master, it will be good, right? Then we can break into two PRs to finish adding new features. Let's see how the reviewers want to go.

junwang-noaa commented 2 years ago

@uturuncoglu You need to use the ESCOMP master branch in your UFS WM branch and in .gitmodules. When the CMEPS PR is committed, we will point back to the noaa-emc emc/develop branch in UFS.

uturuncoglu commented 2 years ago

@junwang-noaa @jkbk2004 Okay. I understand. I'll let you know when it is ready. Thanks

jkbk2004 commented 2 years ago

@junwang-noaa @jkbk2004 Okay. I understand. I'll let you know when it is ready. Thanks

@junwang-noaa emc/develop cmeps branch still 90 commits behind ESCOMP:master. will syncing emc/develop branch first help out?

junwang-noaa commented 2 years ago

@jkbk2004 My understanding is that Ufuk already tested with the ESCOMP CMEPS updates along with the exchanges grid code changes and showed that the updates do not impact the current RT baseline results, we only need to add the new test to the baseline. @uturuncoglu Please confirm.

uturuncoglu commented 2 years ago

@junwang-noaa @jkbk2004 Yes, it is already tested but i think the PR will be tested automatically throughly the actions on different platforms. Right?

jkbk2004 commented 2 years ago

@uturuncoglu Yes, we can give a try with auto-RT on one machine. Do you want to use ESCOMP master branch in your UFS WM fork branch?

uturuncoglu commented 2 years ago

@jkbk2004 It is ready. Now points ESCOMP master.

jkbk2004 commented 2 years ago

@uturuncoglu sure! let me add cheyenne-gnu RT first.

github-actions[bot] commented 2 years ago

@uturuncoglu please bring these up to date with respective authoritative repositories

uturuncoglu commented 2 years ago

@jkbk2004 I am not sure why still complaining about CMEPS. It is pointing the ESCOMP master at this point.

jkbk2004 commented 2 years ago

@junwang-noaa We need to fetch emc/develop cmeps fork branch up to escomp master. git repo check complains about that. @uturuncoglu Your feature already in the escomp master, right?

jkbk2004 commented 2 years ago

@uturuncoglu git repo check complains emc fork branch is behind to master.

uturuncoglu commented 2 years ago

@jkbk2004 yes. that is right.

junwang-noaa commented 2 years ago

@uturuncoglu Please create a branch from noaa-emc cmeps emc/develop branch, then bring in the changes from ESCOMP maser branch. After that make a PR from your cmeps branch to noaa-emc emc/develop branch and use that branch in your ufs-weather-model branch. I think the ESCOMP cmeps master branch has a different history/revision from the noaa-emc emc/develop branch, it can't be merged directory.

uturuncoglu commented 2 years ago

@junwang-noaa Okay. Sure. I think I need to point that brach in the UFS level rather than pointing ESCOMP master. Right?

junwang-noaa commented 2 years ago

Yes

uturuncoglu commented 2 years ago

@junwang-noaa This issue is I have already forked ESCOMP/CMEPS and I could not fork NOAA-EMC/CMEPS under same account. Do you know any workaround for it. I could rename the ESCOMP/CMEPS but I think it will not work since I tried before for another project. I could create another GitHub account and make in there. Any suggestions?

rsdunlapiv commented 2 years ago

@uturuncoglu can you make the branch directly in noaa-emc/cmeps? @junwang-noaa can Ufuk be granted write permissions to that repo?

ghost commented 2 years ago

@rsdunlapiv Yes. that will solve the issue.

DusanJovic-NOAA commented 2 years ago

@uturuncoglu can you make the branch directly in noaa-emc/cmeps? @junwang-noaa can Ufuk be granted write permissions to that repo?

We typically do not allow people to create branches under EMC owned repos other than main branch and release branches. All development work should be done in personal forks.

ghost commented 2 years ago

@DusanJovic-NOAA yes. totally understandable but this seems to be limitation of GitHub. Once we create the brach and point it, you could remove me from the list.

junwang-noaa commented 2 years ago

@uturuncoglu Your don't need to fork from EMC repo, you can:

1) checkout your cmeps repo 2) set up an emc remote and checkout the emc/develop branch 3) create a branch from theemc/develop branch and push the branch to your cmeps repo.

ghost commented 2 years ago

@junwang-noaa okay. let me try