mom-ocean / MOM6

Modular Ocean Model
Other
185 stars 232 forks source link

GFDL to main, 2023-11-13 #1616

Closed marshallward closed 9 months ago

marshallward commented 1 year ago

This PR contains many new features, bugfixes, and overall changes to the MOM6 codebase. In particular, the model now includes a full non-Boussinesq solver alongside the existing Boussinesq and "semi-non-Boussinesq" modes. Other major features include updates to the ice shelf model, improved particle tracer support, improvements to the computational performance of the Zanna-Bolton parameterization, fixes to the internal tide energetics, brine plume mixing, and self-attraction and loading.

It also includes the usual bundle of bug fixes and code refactors.

Non-Boussinesq

Ice Shelf

Particle Tracers

ZB2020

Internal Tide

Brine Plume Mixing

Self-attraction and Loading

Additional Features

Bugfix

Refactor

Testing

Build

Misc

Contributors

codecov[bot] commented 1 year ago

Codecov Report

Attention: Patch coverage is 24.13651% with 1845 lines in your changes are missing coverage. Please review.

Project coverage is 37.45%. Comparing base (dbda49c) to head (6d7c00a). Report is 15 commits behind head on main.

Files Patch % Lines
src/parameterizations/lateral/MOM_Zanna_Bolton.F90 1.07% 277 Missing :warning:
...c/parameterizations/lateral/MOM_internal_tides.F90 0.00% 237 Missing :warning:
src/diagnostics/MOM_wave_speed.F90 22.68% 173 Missing and 11 partials :warning:
src/core/MOM_interface_heights.F90 13.72% 130 Missing and 2 partials :warning:
src/core/MOM_forcing_type.F90 18.04% 99 Missing and 10 partials :warning:
src/core/MOM.F90 39.43% 61 Missing and 25 partials :warning:
src/ice_shelf/MOM_ice_shelf.F90 0.00% 75 Missing :warning:
src/core/MOM_barotropic.F90 56.75% 46 Missing and 18 partials :warning:
src/core/MOM_open_boundary.F90 33.33% 53 Missing and 7 partials :warning:
src/ALE/MOM_ALE.F90 40.65% 45 Missing and 9 partials :warning:
... and 41 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1616 +/- ## ========================================== - Coverage 37.91% 37.45% -0.46% ========================================== Files 269 270 +1 Lines 77155 79765 +2610 Branches 14164 14833 +669 ========================================== + Hits 29252 29878 +626 - Misses 42625 44350 +1725 - Partials 5278 5537 +259 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jiandewang commented 1 year ago

@marshallward a new variable "p_surf_EOS" appeared in the restart file in one of our runs (super coarse resolution and simple settng for our script testing). All others (1x1, 05x05 and quarter degree) don't contain this variable. Which parameter is associated with this ?

klindsay28 commented 1 year ago

I notice that in src/tracer/MOM_CFC_cap.F90, the argument conversion=GV%Rho0*US%R_to_kg_m3 was added to the register_diag_field call for the diagnostics cfc11 and cfc12. The field being passed to post_data for these diagnostics is the product of the tracer concentrations and the value being passed to this conversion argument. Should this multiplication have been removed in the post_data call?

It looks like this was done properly in https://github.com/NOAA-GFDL/MOM6/commit/d16f343e42615d09761dc50aaca268b60a7ca504, but got messed up with the merge of main to dev/gfdl at https://github.com/NOAA-GFDL/MOM6/commit/e5b64f91cc6a86d88c5ba34974e39b30697ec660.

marshallward commented 1 year ago

@klindsay28 Yes, I bungled this part of the merge. Good thing that you caught it.

https://github.com/NOAA-GFDL/MOM6/blob/cce4b3da5e26b3ee2cae6e683988410340abad1b/src/tracer/MOM_CFC_cap.F90#L407-L408

I will put in a fix to remove those units from the post_data call.

jiandewang commented 1 year ago

@marshallward the wcoss2 non-reproducable issue happened again. I will do some debug test and provide some clue for you.

marshallward commented 1 year ago

@marshallward a new variable "p_surf_EOS" appeared in the restart file in one of our runs (super coarse resolution and simple settng for our script testing). All others (1x1, 05x05 and quarter degree) don't contain this variable. Which parameter is associated with this ?

This is a new restart variable, it only appears if there is a surface pressure (tv%psurf). It's required to work out the non-Boussinesq densities, but @Hallberg-NOAA can say more.

(Commit: 11c3f56435)

Re: WCOSS2 regression, I'll wait to hear back from you. We might need a deep dive to figure out why this keeps happening.

jiandewang commented 1 year ago

@marshallward a new variable "p_surf_EOS" appeared in the restart file in one of our runs (super coarse resolution and simple settng for our script testing). All others (1x1, 05x05 and quarter degree) don't contain this variable. Which parameter is associated with this ?

This is a new restart variable, it only appears if there is a surface pressure (tv%psurf). It's required to work out the non-Boussinesq densities, but @Hallberg-NOAA can say more.

(Commit: 11c3f56)

@marshallward this solved my puzzle as we use its default (True) in super coarse setting while it's False in others. A related byproduct is found here: https://github.com/NOAA-GFDL/MOM6-examples/blob/dev/gfdl/ice_ocean_SIS2/OM4_025/MOM_input#L57 https://github.com/NOAA-GFDL/MOM6-examples/blob/dev/gfdl/ice_ocean_SIS2/OM4_05/MOM_input#L57 the default should be True Re: WCOSS2 regression, I'll wait to hear back from you. We might need a deep dive to figure out why this keeps happening.

jiandewang commented 1 year ago

@marshallward I put run log file on gaea: /lustre/f2/dev/ncep/Jiande.Wang/For-Marshall. Note on wcoss2 we have 2 jobs that are compiled under DEBUG mode and they retain the results.

marshallward commented 1 year ago

Looks like the first diff is in vertvisc_coef:

4540c4574
< v-point: c= 182747883 v vertvisc_coef a_[uv]
---
> v-point: c= 182747884 v vertvisc_coef a_[uv]
4634a4669,4672

So only a single bit has changed (and presumably propagated to the future changes).

There are many changes in find_coupling_coef but they mostly involve removal of dimensional coefficients. Their removal should not normally affect the computation. Perhaps it could affect the manner of optimization, but I would not think so.

In one case, these arguments were reversed:

@@ -2042,9 +2096,9 @@ subroutine find_coupling_coef(a_cpl, hvel, do_i, h_harm, bbl_thick, kv_bbl, z_i,
       ! These expressions assume that Kv_tot(i,nz+1) = CS%Kv, consistent with
       ! the suppression of turbulent mixing by the presence of a solid boundary.
       if (dhc < bbl_thick(i)) then
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (dhc+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((dhc+h_neglect) + I_amax*kv_bbl(i))
       else
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (bbl_thick(i)+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((bbl_thick(i)+h_neglect) + I_amax*kv_bbl(i))
       endif
     endif ; enddo
     do K=nz,2,-1 ; do i=is,ie ; if (do_i(i)) then

I can't imagine this changing answers (assuming FMA is turned off), but maybe a compiler responds differently?

There is also a change from forces%ustar to an explicit Ustar_2d argument. This somewhat resembles the issue from the NCAR PR. Maybe there is some memory issue with forces.

I'm sure I've asked all of this before, but could you remind me about WCOSS2? Is it using the REPRO flags from a mkmf template? The -O level would be most important here. Also ifort --version and lscpu | grep "Model name" might help.

jiandewang commented 1 year ago

@marshallward we are using -O2 jiande.wang@clogin05:~> ifort --version ifort (IFORT) 19.1.3.304 20200925 Copyright (C) 1985-2020 Intel Corporation. All rights reserved.

jiande.wang@clogin05:~> lscpu |grep "Model name" Model name: AMD EPYC 7702 64-Core Processor

-g -traceback -i4 -r8 -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -sox -O2 -debug minimal -fp-model

marshallward commented 1 year ago

-g -traceback -i4 -r8 -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -sox -O2 -debug minimal -fp-model

-fp-model source?

jiandewang commented 1 year ago

-g -traceback -i4 -r8 -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -sox -O2 -debug minimal -fp-model source -module mod -c /lfs/h2/emc/couple/noscrub/Jiande.Wang/MOM6-update/20231113/ufs-weather-model-BL/MOM6-interface/MOM6/config_src/infra/FMS2/MOM_couplertype_infra.F90 -o CMakeFiles/mom6_obj.dir/MOM6/config_src/infra/FMS2/MOM_couplertype_infra.F90.o

jiandewang commented 1 year ago

-g -traceback -i4 -r8 -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -sox -O2 -debug minimal -fp-model

-fp-model source?

I put compile log at /lustre/f2/dev/ncep/Jiande.Wang/For-Marshall. see compile.txt. It's hard to read but you can jump to line 900.

kshedstrom commented 1 year ago

The non-Boussinesq changes caused me an issue address with https://github.com/NOAA-GFDL/MOM6/pull/520

sanAkel commented 1 year ago

The non-Boussinesq changes caused me an issue address with https://github.com/NOAA-GFDL/MOM6/pull/520.

@kshedstrom ⬆️ URL does not exist!

kshedstrom commented 1 year ago

Ack, fixed it.

adcroft commented 1 year ago

@jiandewang Is there any chance we (you) could do a git bisect to find the specific commit that causes the divergence? Our PR has a linear history so it should be definitive. Without knowing the commit (and without access to the machine), we are essentially speculating about the problem.

jiandewang commented 1 year ago

@jiandewang Is there any chance we (you) could do a git bisect to find the specific commit that causes the divergence? Our PR has a linear history so it should be definitive. Without knowing the commit (and without access to the machine), we are essentially speculating about the problem.

yes that's what in my mind. Let me have a short chat with Marshall.

Hallberg-NOAA commented 1 year ago

The new p_surf_EOS restart variable is required so that the potential energy reported at the start of a run-segment in non-Boussinesq mode matches the energy written out at the end of the previous segment. It is being saved whenever USE_PSURF_IN_EOS = True so that any expressions that use the equation of state during the ocean model initialization before the coupler has had a chance to provide the surface pressure for the next timestep will be correct. In the case of a non-Boussinesq model, the specific volume is needed in the calculation of APE, but it is easy to envision other (Boussinesq) quantities for which density needs to be derived from the temperature, salinity and layer thickness fields during initialization, and this extra surface pressure field is the one piece that is missing. At no point would adding this new restart field break reproducibility across restarts for any case that worked previously.

However, if people do have a problem with this speculatively added restart variable, we could easily add to the logical test before the call register_restart_field(CS%tv%p_surf, "p_surf_EOS", ...) statement so that it is only written in the non-Boussinesq case where it is essential.

jiandewang commented 1 year ago

The new p_surf_EOS restart variable is required so that the potential energy reported at the start of a run-segment in non-Boussinesq mode matches the energy written out at the end of the previous segment. It is being saved whenever USE_PSURF_IN_EOS = True so that any expressions that use the equation of state during the ocean model initialization before the coupler has had a chance to provide the surface pressure for the next timestep will be correct. In the case of a non-Boussinesq model, the specific volume is needed in the calculation of APE, but it is easy to envision other (Boussinesq) quantities for which density needs to be derived from the temperature, salinity and layer thickness fields during initialization, and this extra surface pressure field is the one piece that is missing. At no point would adding this new restart field break reproducibility across restarts for any case that worked previously.

However, if people do have a problem with this speculatively added restart variable, we could easily add to the logical test before the call register_restart_field(CS%tv%p_surf, "p_surf_EOS", ...) statement so that it is only written in the non-Boussinesq case where it is essential.

@Hallberg-NOAA this is fine for us as what I need to know is the reason for this new variable that appeared in one of our test (and you and Marshall have explained to me clearly).

jiandewang commented 1 year ago

@jiandewang Is there any chance we (you) could do a git bisect to find the specific commit that causes the divergence? Our PR has a linear history so it should be definitive. Without knowing the commit (and without access to the machine), we are essentially speculating about the problem.

yes that's what in my mind. Let me have a short chat with Marshall.

that's what I did in previous NCAR PR. This time things are a bit trick as there are several new files being added at certain stage (we use cmake which requires to list every files) and there is a parameter change (DEFAULT_2018_ANSWERS) at certain stage.

Hallberg-NOAA commented 1 year ago

@jiandewang Is there any chance we (you) could do a git bisect to find the specific commit that causes the divergence? Our PR has a linear history so it should be definitive. Without knowing the commit (and without access to the machine), we are essentially speculating about the problem.

yes that's what in my mind. Let me have a short chat with Marshall.

that's what I did in previous NCAR PR. This time things are a bit trick as there are several new files being added at certain stage (we use cmake which requires to list every files) and there is a parameter change (DEFAULT_2018_ANSWERS) at certain stage.

If you explicitly setting DEFAULT_2018_ANSWERS in your tests, you should be able to get the same effect for every commit in this PR chain by setting DEFAULT_ANSWER_DATE to the right value (20181231 or 20190101) instead.

alperaltuntas commented 1 year ago

Just to confirm, DEFAULT_2018_ANSWERS = Falseand DEFAULT_ANSWER_DATE = 20190101 are equivalent, right?

Hallberg-NOAA commented 1 year ago

Just to confirm, DEFAULT_2018_ANSWERS = Falseand DEFAULT_ANSWER_DATE = 20190101 are equivalent, right?

A better explicit value might be to set DEFAULT_ANSWER_DATE = 99991231, which is the same as always using the latest code options unless they are explicitly set for individual parameters. However, if you know a specific date for the code version you want to reproduce, you could use that date for the default_answer date (e.g, if you have code from February 1st, 2023, you might want to set DEFAULT_ANSWER_DATE = 20230201). I think that we this scheme in detail when we started the obsoleting process for the ..._2018_ANSWERS and replacing them with ANSWER_DATE parameters at a MOM6 dev call in the summer of 2022.

Regardless, we have been logging the ANSWER_DATE parameters in the MOM_parameter_doc files for over a year now, and these are what you need to reproduce to recreate a specific set of answers.

alperaltuntas commented 1 year ago

@gustavo-marques, we are getting b4b results except for the salt_flux_added field in the h.native stream: Previously, we were getting all 0s on ocean points, but now we are getting all NaNs everywhere. All other streams and fields are identical. Is this expected or should we further investigate?

gustavo-marques commented 1 year ago

@gustavo-marques, we are getting b4b results except for the salt_flux_added field in the h.native stream: Previously, we were getting all 0s on ocean points, but now we are getting all NaNs everywhere. All other streams and fields are identical. Is this expected or should we further investigate?

@alperaltuntas, thanks for running the tests and reporting this. I never look at salt_flux_added only salt_flux and salt_flux_in. I do not think this should hold this PR, but we definitely should investigate this further. Perhaps salt_flux_added is not working properly within the NUOPC cap.

klindsay28 commented 1 year ago

It looks like https://github.com/NOAA-GFDL/MOM6/pull/401 added a partial implementation of a new diagnostic salt_left_behind, and overwrote the value of handles%id_saltFluxAdded, causing the diagnostic salt_flux_added to never get values posted to it.

2 possible approaches to correcting this are

  1. Remove the added call to register_diag_field for salt_left_behind which overwrites handles%id_saltFluxAdded, i.e., remove the partial implementation of the diagnostic salt_left_behind, or
  2. correct and complete the implementation of the diagnostic salt_left_behind.

I suggest that @kshedstrom, as the author of https://github.com/NOAA-GFDL/MOM6/pull/401, should chime in.

kshedstrom commented 1 year ago

I need to look into this. I meant to add a new variable to pass from SIS2 to MOM6 via the coupler, not to break anything else.

kshedstrom commented 1 year ago

Please check out NOAA-GFDL/MOM6#526

alperaltuntas commented 1 year ago

Please check out NOAA-GFDL/MOM6#526

Thanks, @kshedstrom. This fixes our isssue (though it appears to have failed the CI tests)

kshedstrom commented 1 year ago

Yeah, I know it failed, but I have no idea why. Sigh.

marshallward commented 1 year ago

Not sure where that phantom FPE came from in MacOS, but it's passing now.

@klindsay28 @alperaltuntas Do you want this merged into the PR to main? (I will need to do a final regression test)

gustavo-marques commented 1 year ago

I am getting the following error when running in DEBUG mode with Intel:

image

It seems related to the introduction of pointer int_tide_CSp in https://github.com/NOAA-GFDL/MOM6/commit/1d35fa1c2f984b94041029f6857e6dfaa410198f

sanAkel commented 1 year ago

I am getting the following error when running in DEBUG mode with Intel:

image

It seems related to the introduction of pointer int_tide_CSp in NOAA-GFDL@1d35fa1

We have run into this sort of problem before (though I'm yet to test this PR!).

@gustavo-marques what are your compiler flags in debug mode?

jiandewang commented 1 year ago

@gustavo-marques it passed in UFS with debug mode, our flag is -g -traceback -i4 -r8 -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -sox -O0 -check -check noarg_temp_created -check nopointer -fpe0 -ftrapuv -init=snan,arrays -module mod -c

gustavo-marques commented 1 year ago

Here is our flags: -qno-opt-dynamic-align -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source -O2 -debug minimal -no-fma -qopt-report -march=core-avx2

marshallward commented 1 year ago

CS%int_tides_CSp is allocated when CS%adiabatic is false (although there is at least one extra function hop).

https://github.com/NOAA-GFDL/MOM6/blob/f4c95ec4b81e2958c151be101d2b15b0de28f910/src/core/MOM.F90#L2835-L2837

although it does take some time to get there:

https://github.com/NOAA-GFDL/MOM6/blob/f4c95ec4b81e2958c151be101d2b15b0de28f910/src/parameterizations/lateral/MOM_internal_tides.F90#L2438

The else-block here can only be entered when diabatic:

https://github.com/NOAA-GFDL/MOM6/blob/f4c95ec4b81e2958c151be101d2b15b0de28f910/src/core/MOM.F90#L3215-L3222

Assuming nothing behaves funny, CS%int_tide_CSp should be associated:

https://github.com/NOAA-GFDL/MOM6/blob/f4c95ec4b81e2958c151be101d2b15b0de28f910/src/parameterizations/vertical/MOM_diabatic_driver.F90#L3043

There are far too many pointers for my taste, but at least in principle it ought to be associated, and the error is confusing to me.

Since the error is in the function call, I have to at least ask if the .mod files are all up to date.

Assuming that, what does the debugger say? If it's still null(), then I think we should get feedback from @raphaeldussin.

gustavo-marques commented 12 months ago

Thank you all for the suggestions! @marshallward, the .mod files were not up to date and that's why it was failing. I did a clean build and the error disappeared. I'm sorry for raising a false issue. Regarding Kate's PR, let us merge it with this candidate and we can re-run our tests again. Thanks!

marshallward commented 12 months ago

NOAA-GFDL/MOM6#526 has now been merged into this candidate branch.

marshallward commented 12 months ago

I am also looking into this FPE error on the MacOS tests, it seems to have only recently just started happening. Kate first noticed it in her PR.

For now, it seems that re-running the test seems to work, which we'll have to accept for now. For those getting email updates, I apologize for the unnecessary spam.

alperaltuntas commented 12 months ago

I encountered the same error message @gustavo-marques posted.

CS%int_tides_CSp is allocated when CS%adiabatic is false (although there is at least one extra function hop).

It appears that for CS%int_tides_CSp to be allocated, INTERNAL_TIDES must be set to True as well : https://github.com/NOAA-GFDL/MOM6/blob/f4c95ec4b81e2958c151be101d2b15b0de28f910/src/parameterizations/vertical/MOM_diabatic_driver.F90#L3589C1-L3589C1

While I'm not proposing this as a solution, I tried out removing the requirement for INTERNAL_TIDES to be true for the allocation of CS%int_tides_CSp. This, however, led to another error:

FATAL from PE     2: Unable to find a match for dimension angle for variable IW_energy_mode1 in file ./MOM_IC.nc
marshallward commented 9 months ago

@jiandewang I have finally managed to get access to WCOSS and have completed some run comparisons. I believe the cause is due to fused multiply-adds (FMAs) being enabled on WCOSS.

I discovered that reversing the order of an equation changed answers. This is the diff:

--- a/src/parameterizations/vertical/MOM_vert_friction.F90
+++ b/src/parameterizations/vertical/MOM_vert_friction.F90
@@ -2042,9 +2042,11 @@ subroutine find_coupling_coef(a_cpl, hvel, do_i, h_harm, bbl_thick, kv_bbl, z_i,
       ! These expressions assume that Kv_tot(i,nz+1) = CS%Kv, consistent with
       ! the suppression of turbulent mixing by the presence of a solid boundary.
       if (dhc < bbl_thick(i)) then
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (dhc+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((dhc+h_neglect)*GV%H_to_Z + I_amax*kv_bbl(i))
       else
-        a_cpl(i,nz+1) = kv_bbl(i) / (I_amax*kv_bbl(i) + (bbl_thick(i)+h_neglect)*GV%H_to_Z)
+        a_cpl(i,nz+1) = kv_bbl(i) / ((bbl_thick(i)+h_neglect)*GV%H_to_Z + I_amax*kv_bbl(i))
       endif
     endif ; enddo
     do K=nz,2,-1 ; do i=is,ie ; if (do_i(i)) then

When I looked at the assembly, I noticed that it was using FMAs to evaluate the expression, which would explain the answer difference.

  113070     .loc    1  2045  is_stmt 1                                       |  113070     .loc    1  2045  is_stmt 1                                      
  113071         vaddsd    %xmm2, %xmm5, %xmm1                           #2045|  113071         vmulsd    %xmm3, %xmm4, %xmm1                           #204
  113073         vmulsd    %xmm9, %xmm1, %xmm2                           #2045|  113073         vaddsd    %xmm2, %xmm5, %xmm2                           #204
  113075         vfmadd231sd %xmm4, %xmm3, %xmm2                         #2045|  113075         vfmadd213sd %xmm1, %xmm9, %xmm2                         #204
  113077         vdivsd    %xmm2, %xmm3, %xmm3                           #2045|  113077         vdivsd    %xmm2, %xmm3, %xmm3                           #204
  113079         vmovsd    %xmm3, -8(%rdi,%r11,8)                        #2045|  113079         vmovsd    %xmm3, -8(%rdi,%r11,8)                        #204
  113081         jmp       ..B8.146      # Prob 100%                     #2045|  113081         jmp       ..B8.146      # Prob 100%                     #204 

Although your template does not turn them on, the Cray wrapper does this implicitly:

$ ftn -craype-verbose
ifort -march=core-avx2 -mtune=core-avx2 

The -march flags will allow FMAs if the hardware supports them. Since fma(a,b,c) is more accurate than a*b + c, it will change answers.

We introduce a lot of changes using dimensional constants, which could easily push the compiler to use or remove existing FMAs. Although most of the time these should not have any effect, there are plenty of situations where it can make a difference.

I repeated the runs with the -no-fma flag, and did not detect any answer changes, so this is almost certainly the problem. But it should also be tested in the UFS suite.


As for how to solve this, I can only see two options:

GFDL does not use FMAs because of this problem, and also suffer from this performance loss.

Neither of these options is good. But I don't think we have a choice. We might need to discuss this at a future MOM meeting.

As for long term solutions, we may need to do more aggressive FMA/no-FMA testing before submitting PRs for review.

awallcraft commented 9 months ago

None of the intel compiler templates at mom-ocean/mkmf/templates include -no-fma. They do typically include -fp-model source and -ftz.

For HYCOM, we always use -fp-model precise -no-fma -ftz, because otherwise we don't get bit for bit reproducability on different numbers of MPI tasks (due to the loop prologue or epilogue after vectorization being different for different tile sizes).

marshallward commented 9 months ago

None of the intel compiler templates at mom-ocean/mkmf/templates include -no-fma. They do typically include -fp-model source and -ftz.

Whether or not the ftn wrapper include -march=core-avx2 will depend on the system. For the systems which have it, we set -march=core-avx-i which overrides -march=core-avx2 and disables FMAs. I looked through our output and did not see any FMA instructions.

(See: https://github.com/mom-ocean/mkmf/blob/3fe7c00f63cd6d07aa10a1c0a02069cc14067c85/templates/ncrc5-intel-classic.mk#L94-L103)

WCOSS was also using -fp-model source so it would seem that it does not inhibit FMAs.

jiandewang commented 9 months ago

yes in UFS we have fp-model but for wcoss2 we also have march=core-avx2 https://github.com/ufs-community/ufs-weather-model/blob/develop/cmake/Intel.cmake#L22-L28

mathomp4 commented 9 months ago

I just took a look at GEOS and we are...confused, I guess. 😄

For GNU, we generally don't specify so I guess GNU can do what it wants. Well, other than some not-well-tested Aggressive build options where FMA was turned off per a suggestion from the mailing list.

For Intel, we do have it on, though it would be interesting to see if it matters. Our Intel flags are a melange of historical things. Essentially we run with:

-O3 -g -march=core-avx2 -fma -align array32byte -traceback -assume realloc_lhs 
-qopt-report0 -align all -fno-alias -fpe3 
-fp-model fast -fp-model source -fp-model consistent -ftz 
-assume noold_maxminloc -diag-disable 10121 -align dcommons -fPIC -qopenmp

so who knows what is actually on and off by the end. But we've found through trial-and-error this gives us okay performance and the reproducibility between chips we want.

sanAkel commented 9 months ago

so who knows what is actually on and off by the end. But we've found through trial-and-error this gives us okay performance and the reproducibility between chips we want.

I'm not aware of the exact criteria for reproducibility between chips of MOM6. Does it include:

  1. Intel
  2. AMD
  3. Apple silicon
  4. ...?
marshallward commented 9 months ago

@mathomp4 with those flags, you are almost certainly using FMAs. I see nothing that would explicitly disable them. I can't speak to the degree of reproducibility, but that is a decision for GEOS.

@sanAkel I don't think we have good information on this topic. We saw answer changes from Intel to AMD, but this turned out to be related to different math libraries, not the chips.

I think we could say a lot more on the topic of reproducibility across hardware and compiler settings, but it might be too much a digression for a PR which has already languished for a very long time. We could start up a new Discussions topic?

sanAkel commented 9 months ago

We could start up a new Discussions topic?

👍

jiandewang commented 9 months ago

in UFS we just added regional coupled run (fv3+MOM6+WAVE) half week ago. Testing from HAFS group (@binli2337) show results changes. John Stephen (@JohnSteffen-NOAA) also reported answer chages in his MOM6 standalone tests. Coupled runs are done on HERA, Bin Li is testing on wcoss2 now . Standalone test is done on ORION (which is similar to HERA). I have put the run log files on GAEA /gpfs/f5/nggps_emc/scratch/Jiande.Wang/For-Marshall/HAFS. Diff on the run log (with DEBUG on) shows difference starts from MOM_initialize_state. Both coupled and standalone runs show similar diff results.