Closed gustavo-marques closed 1 year ago
Merging #1610 (87cf00b) into main (afb3a44) will decrease coverage by
0.27%
. The diff coverage is16.66%
.:exclamation: Current head 87cf00b differs from pull request most recent head 3d07e5b. Consider uploading reports for the commit 3d07e5b to get more accurate results
@@ Coverage Diff @@
## main #1610 +/- ##
==========================================
- Coverage 38.17% 37.91% -0.27%
==========================================
Files 269 269
Lines 76503 77155 +652
Branches 14064 14164 +100
==========================================
+ Hits 29205 29252 +47
- Misses 42051 42625 +574
- Partials 5247 5278 +31
Files | Coverage Δ | |
---|---|---|
...ig_src/drivers/solo_driver/MOM_surface_forcing.F90 | 24.04% <100.00%> (ø) |
|
src/core/MOM.F90 | 51.36% <100.00%> (ø) |
|
src/core/MOM_forcing_type.F90 | 42.24% <ø> (+0.27%) |
:arrow_up: |
src/core/MOM_unit_tests.F90 | 15.00% <ø> (ø) |
|
src/core/MOM_variables.F90 | 48.41% <ø> (+1.44%) |
:arrow_up: |
src/diagnostics/MOM_obsolete_params.F90 | 73.78% <100.00%> (+0.25%) |
:arrow_up: |
...c/parameterizations/vertical/MOM_set_viscosity.F90 | 60.98% <ø> (ø) |
|
src/tracer/MOM_tracer_types.F90 | 0.00% <ø> (ø) |
|
...nfig_src/infra/FMS1/MOM_ensemble_manager_infra.F90 | 30.00% <75.00%> (+2.22%) |
:arrow_up: |
src/framework/MOM_get_input.F90 | 62.16% <0.00%> (ø) |
|
... and 13 more |
... and 1 file with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
What version of FMS is everyone using? I'm trying 2023.02 for Theresa's CIOD and it is forcing me to use the intel compiler instead of gnu. Assuming I did the git merge correctly, this PR is giving me:
use fms_io_mod, only : fms_io_set_filename_appendix=>set_filename_appendix
------------------------^
compilation aborted for //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/infra/FMS2/MOM_ensemble_manager_infra.F90 (code 1)
What version of FMS is everyone using? I'm trying 2023.02 for Theresa's CIOD and it is forcing me to use the intel compiler instead of gnu. Assuming I did the git merge correctly, this PR is giving me:
use fms_io_mod, only : fms_io_set_filename_appendix=>set_filename_appendix ------------------------^ compilation aborted for //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/infra/FMS2/MOM_ensemble_manager_infra.F90 (code 1)
There should no longer be any references to fms_io
in the source code.
@gustavo-marques I believe that this may need to be patched.
Could this be an error?
Thanks for catching this. These lines were added in this PR: https://github.com/NCAR/MOM6/pull/241 I will check with @alperaltuntas and report back.
Please let us know if Alper's patch fixed this issue.
I still got the same error. You took out the line using the function, but you still need to get rid of this line:
use fms_io_mod, only : fms_io_set_filename_appendix=>set_filename_appendix
Without this line, it compiles. Some tests are successful, some not, but they do run when compiled for debugging which is annoying. A typical error from say MOM6-examples/ocean_only/single_column/EPBL:
FATAL: write_energy : Excessive energy per unit mass forced model termination.
Image PC Routine Line Source
MOM6 00000000018514E7 Unknown Unknown Unknown
MOM6 00000000012A20F6 mpp_mod_mp_mpp_er 72 mpp_util_mpi.inc
MOM6 0000000000C88B5F mom_error_handler 154 MOM_error_handler.F90
MOM6 000000000041CBAC mom_sum_output_mp 901 MOM_sum_output.F90
MOM6 00000000010CE52E mom_mp_step_mom_ 1016 MOM.F90
MOM6 0000000000699337 MAIN__ 484 MOM_driver.F90
MOM6 00000000004132E2 Unknown Unknown Unknown
libc-2.28.so 00007FDAC7519D85 __libc_start_main Unknown Unknown
MOM6 00000000004131EE Unknown Unknown Unknown
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Compiler: intel/2022a
I still got the same error. You took out the line using the function, but you still need to get rid of this line:
Oops, I'll correct that.
As for the other error (Excessive energy per unit mass forced model termination
), I was unable to reproduce it with intel/2022.1 and FMS 2023.02 on cheyenne, and with the latest MOM6-examples configuration. Which version of single_column/EPBL run are you running? Have you tried another compiler?
Since it runs in debug mode, I'm willing to chalk it up as a compiler issue. All the ESMG tests run.
I don't have other compilers to try on "new chinook" with my current software versions. I'll approve assuming that line will get taken out.
It worked fine on R&D machine (b4b results) but I am seeing different results on EMC operational machine. I am doing debug work and will provide some clue soon.
h-point: c= 3602765 Before steps forces%ustar
Switching to ifort 2023.1 fixed this for me (and broke something else).
@gustavo-marques and @alperaltuntas some updating from UFS: I tried bisect and made some testing, what I can confirm is the last commit that works for UFS is commit d4aa10857b7679701fdbd44ea67adc6213c0800c Author: Ian Grooms ian.grooms@colorado.edu Date: Tue Aug 29 09:39:55 2023 -0600
Add Leith+E (#251)
* Add Leith+E
and problem starts from: d7415177e47bcfd00a6c46db8184d7f982211f76 Merge: 92756fe84 d4aa10857 Author: Gustavo Marques gmarques@ucar.edu Date: Wed Aug 30 14:45:49 2023 -0600
Merge branch 'dev/ncar' into fpmix_draft_15August2023
our regression system only allows each user to run one regression test at the same time, thus it is a slow process for me to provide this clue to you.
This PR passed our regression, but I suspect we will need a bit of time to manually review the changes.
I am also a bit concerned about the issues raised by @kshedstrom, although I'm not yet sure which are related to this PR.
@jiandewang, thanks for doing these tests and narrowing down where the change is coming from. I will look into this tomorrow and get back to you.
@jiandewang, thanks for doing these tests and narrowing down where the change is coming from. I will look into this tomorrow and get back to you.
@gustavo-marques one more clue for you: UFS quarter degree runs are fine on wcoss2. The half and 1x1 degree runs all failed to retain baseline results. We follow closely to the MOM_input in MOM_examples for all these 3 resolutions, so the issue must be some parameter related but I can't figure out which one it is.
@marshallward; regression tests are failing because KhTr_u
, KhTr_v
, and KhTr_h
were changed from 2D (lat/lon) to 3D (lat/lon/depth).
@jiandewang; we are still unable to find where the issue is coming from. Can you please send us the MOM_parameters_doc.short for the passing (1/4 deg) and failing (1/2 deg)?
@jiandewang; we are still unable to find where the issue is coming from. Can you please send us the MOM_parameters_doc.short for the passing (1/4 deg) and failing (1/2 deg)?
see https://www.emc.ncep.noaa.gov/gc_wmb/wd20xw/JD/MOM_parameter_doc.short.tar-txt, remove "txt" before untar it
I have tried to look at this, and the only thing that seems to make sense is a regression in forces%ustar
. Problems don't appear to become serious until the ePBL solver, but the differing term, Kd_ePBL
, depends on ustar
.
The only idea I have at the moment is that the inclusion of forces%omega_w2x
is causing some kind of pointer corruption, or is shuffling things around in memory in a way that has discovered a hidden bug. But that feels like a "blame the compiler" excuse, not a true explanation.
My best suggestion for now is to completely remove all of the content related to omega_w2x
and see if it helps to fix the problem. If so, then I recommend we remove all of this work and address it in a separate PR.
@marshallward: thanks for the feedback.
@jiandewang: I commented out all omega_w2x
entries in my last commit. Can you please run your tests again and let us know if the issue is fixed?
@gustavo-marques: with your latest commit, all UFS jobs retain current baseline now.
@gustavo-marques Did you want to clean up these changes? Or did you want to leave them in so that we can revisit this problem?
This is now the second mysterious computational ~issue~ regression detected by WCOSS, we may need to start doing some work to figure out what could be happening here.
@marshallward, our computers are down this week. I suggest we take this PR as is and we (NCAR) will revisit this issue in a future PR.
@gustavo-marques Hope you don't mind, I went ahead and merged this for you.
Thanks to everyone, this one was particularly tricky.
@marshallward Thanks for merging it
This PR includes the following:
Features
New parameters
Bug fixes & documentation improvements
Code improvements and cleanup
https://github.com/NCAR/MOM6/pull/225 (@gustavo-marques).
Obsolete MCT cap
Contributors