Closed uturuncoglu closed 2 years ago
I updated the model and sub-modules to the recent versions and try to address the issues. Let me know if you need any other changes to move forward this PR.
@junwang-noaa This PR is ready for review. I also run full test suite again and all of them passing on Cheyyene.
ORT test issues: 1) debug test is fixed. 2). Still working on threading test. 3) next is the restart test.
@junwang-noaa I think we figure out the issue with the ORT threading test. After careful examination, I tracked it to the remapping in CMEPS aoflux code that involves exchange grid. The roundoff difference in the remap call (due to ordering of operations under different PET configurations and ordering or corner points in ESMF Mesh) is causing answer change in the model since threading test uses less processor on mediator and FV3 components and the mapping is responsible to transfer data back and forth between component grids and exchange grid. This changes the forcing and slightly modified forcing goes to CCPP and causes change in the answer. So, at this point using different number of core in mediator and FV3 prevents to reproduce the results in ORT threading test when exchange grid involves. If I calculate the fluxes on agrid or ogrid the CCPP reproduces the results even with the different number of PETs assigned to mediator and FV3. So, this is specific to exchange grid (xgrid).
If I update the ORT threading test for cpld_control_c96_noaero_p8_xgrid and provide same number of processor to mediator and FV3 as baseline (*_std_base) then the ORT threading test is passing also for exchange grid with OMP_NUM_THREADS=2.
So, there is nothing needs to be done in CCPP side or CMEPS CCPP host model at this point. I also deactivate the reading initial condition in the CCPP host model and that fixed the issue with other ORT tests (dcp and fez tests). We could look at initialization later. So, at this point all ORT tests pass (with the mods that I did for the threading). I also run full RT suite on Cheyenne and all tests passes except xgrid one which has no baseline yet. Anyway, I think that PR is ready to merge in after review from side.
Wow, great detective work, Ufuk!
On May 26, 2022, at 9:41 AM, Ufuk Turunçoğlu @.***> wrote:
@junwang-noaa https://github.com/junwang-noaa I think we figure out the issue with the ORT threading test. After careful examination, I tracked it to the remapping in CMEPS aoflux code that involves exchange grid. The roundoff difference in the remap call (due to ordering of operations under different PET configurations and ordering or corner points in ESMF Mesh) is causing answer change in the model since threading test uses less processor on mediator and FV3 components and the mapping is responsible to transfer data back and forth between component grids and exchange grid. This changes the forcing and slightly modified forcing goes to CCPP and causes change in the answer. So, at this point using different number of core in mediator and FV3 prevents to reproduce the results in ORT threading test when exchange grid involves. If I calculate the fluxes on agrid or ogrid the CCPP reproduces the results even with the different number of PETs assigned to mediator and FV3. So, this is specific to exchange grid (xgrid).
If I update the ORT threading test for cpld_control_c96_noaero_p8_xgrid and provide same number of processor to mediator and FV3 as baseline (*_std_base) then the ORT threading test is passing also for exchange grid with OMP_NUM_THREADS=2.
So, there is nothing needs to be done in CCPP side or CMEPS CCPP host model at this point. I also deactivate the reading initial condition in the CCPP host model and that fixed the issue with other ORT tests (dcp and fez tests). We could look at initialization later. So, at this point all ORT tests pass (with the mods that I did for the threading). I also run full RT suite on Cheyenne and all tests passes except xgrid one which has no baseline yet. Anyway, I think that PR is ready to merge in after review from side.
— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/1157#issuecomment-1138715067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RJBJWSVY4SCZAVNLODVL6LSZANCNFSM5SUEAEIQ. You are receiving this because you were mentioned.
@climbfuji Thanks. I think ORT tests bring extra complexity to the PRs but it is good to have them since we fixed lots of things in the PR with the help of them.
@junwang-noaa the code is updated and also I updated the PR description to reflect newly added options to enable I/O capability such as reading and writing restart files.
@junwang-noaa i will change the aoflux_grid = 'xgrid'
to aoflux_grid = 'agrid'
in the exchange grid RT. I have already tested on Orion and with agrid
we could reproduce the results. Do you also want to change the name of the RT, since it has xgrid
suffix?
Yes, please. Thanks
On Fri, Jun 3, 2022 at 12:36 PM Ufuk Turunçoğlu @.***> wrote:
@junwang-noaa https://github.com/junwang-noaa i will change the aoflux_grid = 'xgrid' to aoflux_grid = 'agrid' in the exchange grid RT. I have already tested on Orion and with agrid we could reproduce the results. Do you also want to change the name of the RT, since it has xgrid suffix?
— Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-weather-model/pull/1157#issuecomment-1146160639, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TNLZTXYRNZWFF63BALVNIX67ANCNFSM5SUEAEIQ . You are receiving this because you were mentioned.Message ID: @.***>
@junwang-noaa I updated the RT test to use agrid
. I also rename the test itself. I also run the test and confirm it is working except failing with the missing baseline. So, let me know if you need any other change.
@uturuncoglu A PR is put on hold in the commit ques. It allows us to work on this PR, I guess, Scope of change is a bit big but no baseline change. I think we can start testing RTs. @junwang-noaa Without further requirement of the ORT on this PR, we can submit the auto-RTs after sync to the head of develop branch.
@jkbk2004 That is great. If you need anything from my side just let me know. Thanks
@jkbk2004 That is great. If you need anything from my side just let me know. Thanks
@uturuncoglu Ok! can you sync your branches up to the head of develop branches: ufs-wm and fv3atm?
@jkbk2004 sure. I am working on it.
@jkbk2004 it seems that the name of cpld_control_c96_noaero_p8 is changed to cpld_control_noaero_p8. Since I was referencing cpld_control_c96_noaero_p8 in new RT which is named cpld_control_c96_noaero_p8_agrid, it think it would be nice to adjust also its name. Let me know what do you think?
@jkbk2004 I updated the name of the test and testing RT again before pushing changes to be sure that update does not break anything.
@jkbk2004 It is done.
@uturuncoglu please bring these up to date with respective authoritative repositories
@uturuncoglu please bring these up to date with respective authoritative repositories
Automated RT Failure Notification Machine: cheyenne Compiler: gnu Job: RT [RT] Repo location: /glade/scratch/epicufsrt/autort/tests/auto/pr/900766075/20220606171513/ufs-weather-model Please make changes and add the following label back: cheyenne-gnu-RT
@uturuncoglu emc/develop cmeps branch is behind master to escomp. git action repo check failure seems to be due to that. Let me talk to emc side.
@uturuncoglu In ufs-weather-model, all the changes need to be made to repositories/branches listed in the .gitsubmodule. the CMEPS is currently pointing to the noaa-emc fork emc/develop branch:
https://github.com/ufs-community/ufs-weather-model/blob/develop/.gitmodules#L19
However your cmeps PR is to commit changes from your branch to ESCOMP master, which is different from the latest noaa-emc emc/develop branch used in ufs-weather-model. If you want to commit your branch to ESCOMP master, please work with ESCOMP code managers to commit your CMEPS first, then make a CMEPS PR to bring the ESCOMP master to noaa-emc emc/develop branch, and link the PR in this ufs-weather-model PR. Otherwise, the CI will fail as your CMEPS is not based on current emc/develop branch. On the other hand, you can have a PR submitted to the noaa-emc emc/develop branch and we commit it along with other subcomponents PRs in this ufs-weather-model PR. After that you make a separate PR to bring changes from emc/develop to ESCOMP master branch as Denise has been doing. Please let us know which way you'd like to do. Currently since CI test failed with revision check, we can't continue with RT.
@jkbk2004 I'll soon merge CMEPS and create another PR to update NOAA EMC fork. Thanks for your help
@jkbk2004 I'll soon merge CMEPS and create another PR to update NOAA EMC fork. Thanks for your help
@uturuncoglu Sure!
@jkbk2004 I merge the PR in ESCOMP. The new PR in NOAA/EMC side is in here https://github.com/NOAA-EMC/CMEPS/pull/66
@jkbk2004 i think I need to use ESCOMP master in my UFS Pr. Right? I am not too familiar to the process. Sorry
@jkbk2004 i think I need to use ESCOMP master in my UFS Pr. Right? I am not too familiar to the process. Sorry
@uturuncoglu I add reviewers. For simplicity, if we can update emc/develop cmeps to the master, it will be good, right? Then we can break into two PRs to finish adding new features. Let's see how the reviewers want to go.
@uturuncoglu You need to use the ESCOMP master branch in your UFS WM branch and in .gitmodules. When the CMEPS PR is committed, we will point back to the noaa-emc emc/develop branch in UFS.
@junwang-noaa @jkbk2004 Okay. I understand. I'll let you know when it is ready. Thanks
@junwang-noaa @jkbk2004 Okay. I understand. I'll let you know when it is ready. Thanks
@junwang-noaa emc/develop cmeps branch still 90 commits behind ESCOMP:master. will syncing emc/develop branch first help out?
@jkbk2004 My understanding is that Ufuk already tested with the ESCOMP CMEPS updates along with the exchanges grid code changes and showed that the updates do not impact the current RT baseline results, we only need to add the new test to the baseline. @uturuncoglu Please confirm.
@junwang-noaa @jkbk2004 Yes, it is already tested but i think the PR will be tested automatically throughly the actions on different platforms. Right?
@uturuncoglu Yes, we can give a try with auto-RT on one machine. Do you want to use ESCOMP master branch in your UFS WM fork branch?
@jkbk2004 It is ready. Now points ESCOMP master.
@uturuncoglu sure! let me add cheyenne-gnu RT first.
@uturuncoglu please bring these up to date with respective authoritative repositories
@jkbk2004 I am not sure why still complaining about CMEPS. It is pointing the ESCOMP master at this point.
@junwang-noaa We need to fetch emc/develop cmeps fork branch up to escomp master. git repo check complains about that. @uturuncoglu Your feature already in the escomp master, right?
@uturuncoglu git repo check complains emc fork branch is behind to master.
@jkbk2004 yes. that is right.
@uturuncoglu Please create a branch from noaa-emc cmeps emc/develop branch, then bring in the changes from ESCOMP maser branch. After that make a PR from your cmeps branch to noaa-emc emc/develop branch and use that branch in your ufs-weather-model branch. I think the ESCOMP cmeps master branch has a different history/revision from the noaa-emc emc/develop branch, it can't be merged directory.
@junwang-noaa Okay. Sure. I think I need to point that brach in the UFS level rather than pointing ESCOMP master. Right?
Yes
@junwang-noaa This issue is I have already forked ESCOMP/CMEPS and I could not fork NOAA-EMC/CMEPS under same account. Do you know any workaround for it. I could rename the ESCOMP/CMEPS but I think it will not work since I tried before for another project. I could create another GitHub account and make in there. Any suggestions?
@uturuncoglu can you make the branch directly in noaa-emc/cmeps? @junwang-noaa can Ufuk be granted write permissions to that repo?
@rsdunlapiv Yes. that will solve the issue.
@uturuncoglu can you make the branch directly in noaa-emc/cmeps? @junwang-noaa can Ufuk be granted write permissions to that repo?
We typically do not allow people to create branches under EMC owned repos other than main branch and release branches. All development work should be done in personal forks.
@DusanJovic-NOAA yes. totally understandable but this seems to be limitation of GitHub. Once we create the brach and point it, you could remove me from the list.
@uturuncoglu Your don't need to fork from EMC repo, you can:
1) checkout your cmeps repo 2) set up an emc remote and checkout the emc/develop branch 3) create a branch from theemc/develop branch and push the branch to your cmeps repo.
@junwang-noaa okay. let me try
PR Checklist
[x] This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.
[x] This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR
[x] An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR are specified below.
[ ] Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.
[ ] New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.
Instructions: All subsequent sections of text should be filled in as appropriate.
The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.
Description
The PR aims to bring exchange grid capability, which is provided by CMEPS mediator, to the UFS weather model. The PR includes following major changes in the model;
nems.configure
and needs to be placedMED_attributes::
section.nems_frac_aoflux
, It allows to pass mediator calculated atmosphere-ocean fluxes to FV3nems_frac_aoflux_sbs
, This mode is just for side-by-side comparisons of fluxes calculated on FV3/CCPP and CMEPS/CCPP. In this mode, the fluxes are calculated under mediator but not sent to FV3. So, the results of fully coupled model has no answer change since the fluxes does not received by FV3 but the mediator history files will include also fluxes calculated under mediator.nems.configure
and default values works but it might be required in the near future when external land component brings in.true
.1
.true
.2
.true
.false
) or a warmstart/restart (true
). The default value istrue
.true
.true
.true
.start_type = 'continue'
), then CCPP host model uses internally calculated file name.INPUT/sfc_data.tile
. Note that there is no need to specify tile no.ccpp_ini_mosaic_file
. The default value isINPUT/
3,8
.input.nml
option (use_med_flux
) is added togfs_physics_nml
group. The default value is .false. and the FV3/CCPP Physics will no accept any flux from external sources such as CMEPS. This allow to preserve the existing functionality in the FV3 side.cpld_control_c96_noaero_p8_xgrid
) is also implemented to test the exchange grid functionality. The baseline needs to be placed into the common input and baseline directory. There is no need to have separate input file for this test since it is just a derivative ofcpld_control_c96_noaero_p8
.Issue(s) addressed
Link the issues to be closed with this PR, whether in this repository, or in another repository. (Remember, issues must always be created before starting work on a PR branch!)
Testing
How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)
The model is tested on NCAR's Cheyenne platform with intel compiler. All regression tests are passing at this point.
Dependencies
If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).
Do PRs in upstream repositories need to be merged first? If so add the "waiting for other repos" label and list the upstream PRs