wrf-model / WRF

The official repository for the Weather Research and Forecasting (WRF) model
Other
1.24k stars 677 forks source link

Corrections for tipping bucket and nudging in very long simulations #2063

Open tlspero opened 3 months ago

tlspero commented 3 months ago

Corrections for precipitation tipping bucket and nudging in very long simulations

TYPE: bug fix

KEYWORDS: precipitation, tipping bucket, nudging, spectral nudging, analysis nudging, grid nudging, regional climate, dynamical downscaling, downscaling

SOURCE: Tanya Spero (U.S. EPA)

DESCRIPTION OF CHANGES: Problem: Several processes in WRF are currently triggered at periodic intervals (such as reading/writing files and managing certain bookkeeping processes). Some of those processes had been coded to identify those time triggers by referencing a variable that contains the time elapsed since the model simulation was initialized. That variable, XTIME, is a Fortran single-precision real variable that counts the number of elapsed minutes since initialization. However, single-precision real numbers become imprecise (i.e., cannot accurately resolve "whole" numbers) after they exceed 2^24, which is 16,777,216. In long simulations, that occurs just before 32 years of simulation period.

After that point, XTIME is never an odd number. If an odd value of XTIME is expected, it can be off by +1 or -1. In some places that use XTIME to trigger functions (like tipping the precipitation bucket every 6 hours when BUCKET_MM > 0.0, or reading and computing temporal weights for nudging), those functions occur on the wrong time step.

For a domain with a 60-second (1-minute) time step, the precipitation bucket tips one time step too early, which makes it impossible to back out the correct precipitation totals over the increment that includes the bucket time (and those totals become negative because of the mismatch in temporal accounting). For a run that was initialized at 1978-10-01_00:00:00, the error first occurred at 2010-08-24_20:17:00, and it was realized in the precipitation at all 6-hour increments afterward (starting with 2010-08-25_00:00:00. The issue can be seen in the log file (rsl.out.XXXX) by examining the values in "Max Accum {Resolved,Convective} Precip". At each 00, 06, 12, and 18 UTC time step, the maximum values should be below the value specified in BUCKET_MM in the physics namelist. At the prior time step, the values for the precipitation can exceed the bucket value. When the error occurs, the bucket is tipped too soon, so the maximum values across the domain will be below the bucket value before 00, 06, 12, and 18 UTC. [This is for long runs.]

For the same domain with a 60-second (1-minute) time step, the nudging fields are read one time step too soon, and the weights are recalculated, so the bracketing analyses are incorrect for that time step. The nudging fields are again read on the correct time step, and sometimes read (yet a third time) on the following time step. This error begins predictably at 23 years plus 3.5 months into a simulation, but I do not know the numeric trigger associated with this time point. For a run that was initialized at 1978-10-01_00:00:00, the problem with the nudging first occurred at 2002-01-22_05:59:00. For another run with a different forcing dataset (and one that included leap years) with an initialization at 2023-10-01_00:00:00, the error in nudging first occurred at 2047-01-22_05:59:00, which is the same elapsed time as in the other run. The issue can be seen in the log file (rsl.out.XXXX) by examining the occurrence of "Spectral nudging read in new data". In a continuous run of more than 23 years (which can include restarts, just not reinitializations; compare value of global attribute SIMULATION_START_DATE to global attribute START_DATE), this read can occur too early and more frequently than supported by data.

Solution: The variable, XTIME, was inherited in WRF from MMx models, and it is pervasive in the model. It is impractical to simply convert XTIME to a double-precision variable, and it would be beyond the scope for me to do that to address this problem. There is also limited support for double-precision variables in the ESMF time-keeping functions that were borrowed by WRF. Instead, I arrived at a pragmatic compromise to move my project forward--and hopefully help others who have long, continuous simulations.

To address the issue related to the tipping bucket precipitation, a change was introduced in phys/module_diag_misc.F to use existing variable, CURR_SECS2, instead of XTIME. The variable CURR_SECS2 is used elsewhere in that routine, and it represents the number of seconds that have elapsed since the restart, not since the initialization. Unfortunately, CURR_SECS2 also goes imprecise after about 6.5 months after a restart. I think we "get away with this" because increments of 60 seconds will round appropriately to the 6-hour tipping point, but this imprecision could be an issue for very small time steps using odd numbers (such as 15 seconds or 5 seconds). A better and more robust (longer-term) solution may be to use CURR_SECS2_R8, which is a double-precision version of CURR_SECS2 that I introduced as part of the solution to the nudging. Alternatively, CURR_MINS2 (which is based on CURR_SECS2_R8) could be used here. However, changing all of the instances of CURR_SECS2 to CURR_SECS2_R8 or CURR_MINS2 would require much broader testing than I can do here.

For the spectral nudging, the upstream codes were modified to make CURR_MINS2 available from dyn_em/solve_em.F through to the nudging routines. I introduced CURR_MINS2, which I calculated from a new double-precision variant of CURR_SECS2, and CURR_MINS2 serves as a proxy for XTIME to determine the appropriate times to read the nudging files and recompute the weights between bracketing analyses. I made the analogous changes in the analysis/grid nudging routine for both 3D and surface nudging.

ISSUE: n/a

LIST OF MODIFIED FILES: For tipping bucket: phys/module_diag_misc.F For nudging: dyn_em/module_first_rk_step_part1.F, dyn_em/solve_em.F, phys/module_fdda_psufddagd.F, phys/module_fdda_spnudging.F, phys/module_fddagd_driver.F for completeness: wrftladj/module_first_rk_step_part1_ad.F, wrftladj/module_first_rk_step_part1_tl.F, wrftladj/solve_em_ad.F, wrftladj/solve_em_tl.F

TESTS CONDUCTED:

  1. Do mods fix problem? How can that be demonstrated, and was that test conducted?

The mods correct the problem. The modifications were tested extensively using a long simulation (using a restart at more than 31 years into the simulation) with WRFv4.5.1 and with shorter (3-day) simulations using WRFv4.6. The modifications were further tested in an ongoing simulation that used WRFv4.5.1 and was picked up at year 26. Testing was performed incrementally, first addressing the tipping bucket, then addressing the spectral nudging. Internal test versions of the two versions of WRF (v4.5.1 and v4.6) were laced with print statements to ensure that the values of XTIME, CURR_SECS2, and CURR_MINS2 were reflected and used as expected by the modifications. In the nudging codes, print statements were inserted during testing to ensure that the calculation of the nudging weights was consistent with the expected values and that the reads became restricted to only occur at the intended times. The log files (rsl.out.XXXX) were visually inspected and compared among incremental simulations. The values of "Max Accum {Resolved,Convective} Precip" were examined at time steps immediately preceding the bucket tipping times and afterward. The values for the "Domain average" were compared in the log file. The times that listed "Spectral nudging read" were noted. The wrfout files were visually inspected in ncview to compare fields between simulations. The precipitation (recorded at 5-minute intervals in a non-standard additional output file) was calculated and confirmed to not contain negative values (via a separate QC program) after the modifications were introduced.

The nudging codes did not change in WRF between WRFv4.5.1 and WRFv4.6, so the codes were directly copied from the base internal testing code in WRFv4.5.1 to WRFv4.6 for final testing. Some changes were introduced in the other codes, so changes were manually transferred between those codes.

The nudging changes were not explicitly tested in the analysis/grid nudging and surface nudging codes, but the algorithms in those codes formed the basis for the original implementation of spectral nudging, so the outcome is likely to be the same. It should be confirmed by any user who wants to make this sort of continuous long run.

This may need to be tested in long runs with small, odd-numbered time steps (like 5 seconds or 15 seconds).

There may be other places in WRF that rely on XTIME for incremental time-related triggers, so a comprehensive search of the code is recommended.

  1. Are the Jenkins tests all passing? I did not conduct the Jenkins testing.

RELEASE NOTE: Corrected algorithms in the tipping bucket for precipitation and in the nudging routines to adjust for imprecision in single-precision real numbers exceeding the resolvable values in long (>23-year) continuous simulations.

weiwangncar commented 3 months ago

@tlspero Thanks for the PR! One of the compile tests failed, and it is for DA/4DVAR. The output is here: output_0.gz

If you need instructions to build WRFDA, see here. My guess is something added to module_first_rt_step_part*.F needs to be added to the counterpart in wrftladj/, the adjoint code.

tlspero commented 2 weeks ago

As suggested by @weiwangncar, I made the analogous changes to the input argument lists for codes in the adjoint model: wrftladj/module_first_rk_step_part1_ad.F and wrftladj/module_first_rk_step_part1_tl.F. In each of those codes, I added "curr_mins2" to the argument list and the declarations, but I did not use "curr_mins2" locally. I compiled the model (WRFv4.6.0), but I did not test the adjoint. I also updated my comprehensive release note information (above) to add those two routines to the list of codes that were modified for this PR; they are "for completeness".

tlspero commented 2 weeks ago

After seeing that one of the automated checks failed, I realized that there were two more routines in the adjoint code that I overlooked: solve_em_ad.F and solve_em_tl.F. I updated those codes, as well. I am also updating the description of the mods in this thread.

tlspero commented 2 weeks ago

@weiwangncar -- Sorry about that. I updated wrftladj/solve_em_ad.F and wrttladj/solve_em_tl.F. This is one of the unfortunate fallacies of inadvertently affecting codes in variants of WRF that are not compiled. Hopefully the mods are complete this time. Thank you for your patience and assistance!

tlspero commented 2 weeks ago

@weiwangncar -- One more time. Ugh. The routine wrftladj/solve_em_ad.F did not have the variable tmpTimeInterval2 defined, unlike wrftladj/solve_em_tl.F and dyn_em/solve_em.F. I added the declaration and the definition. Hopefully this is the last tinkering. (Again, unfortunately, I am not set up to compile the adjoint model; otherwise I would have caught these issues before pushing the commits.) Thank you for your patience.

tlspero commented 2 weeks ago

Well...that was not enough, either. I cannot access the results of the Jenkins testing, other than to see that it failed. I am open to suggestions to cleaning this up. I am assuming the fail is still in the adjoint code, which I am not set up to compile.

weiwangncar commented 2 weeks ago

@tlspero Thanks for trying. I will take another look and let you know.

weiwangncar commented 1 week ago

@tlspero I added a few comments regarding the solve_em_ad and solve_em_tj code. The one that affects compilation may be the one for the call to g_first_rk_step_part1.