Open JeroBnd opened 3 weeks ago
The regression test results:
Test Type | Expected | Received | Failed
= = = = = = = = = = = = = = = = = = = = = = = = = = = =
Number of Tests : 23 24
Number of Builds : 60 57
Number of Simulations : 158 150 0
Number of Comparisons : 95 86 0
Failed Simulations are:
None
Which comparisons are not bit-for-bit:
None
To help us review this, can you add more explanation of the fix in the description section?
Hello.
The adaptive time step module operates with a precision of 1/100 seconds, resulting in simulation times with the same precision.
The process that determines the dtInterval involves several steps, checking various conditions. One of these conditions is the precision of 1/100 seconds.
When adjusting the time step to boundary conditions (BC) and output times, the algorithm divides a temporary time interval (with 1/100 sec precision) by two and assigns it to dtInterval. When this temporary time interval has an odd value, the precision changes to 1/200 sec, and the next simulation time also has a precision of 1/200 sec.
In the next step, adjacent to the BC or output time, the algorithm truncates the 1/200 sec precision, setting the simulation time to 1/200 sec before the BC or output time (without the mitigation done in #154).
The following dtInterval, which is 1/200 sec, is then truncated to a precision of 1/100 sec, resulting in a dtInterval of 0.
This effect is mitigated in #154 but does not address the underlying source of the problem.
@JeroBnd Can you expand IDING SAS? Are you working with Kugler who posted the issue?
I am not working with Kugler. I am from Córdoba, Argentina. IDING SAS is a startup. We provide services to APRHI (Provincial Administration of Water Resources) for reservoir management. In this context, we are operationally running a high-resolution weather forecast ensemble with WRF.
I found this bug while trying to debug an error caused by myself in the namelist.input that did not throw a warning.
There are several things to do in the adapt_timestep module...
@JeroBnd Thanks for the info. I tested one of the cases Kugler had problem with, the em_b_wave case, your change didn't help. Did you test that case, using his namelist.input file?
@weiwangncar, the adapt_timestep module is not prepared to handle this kind of idealized case. The algorithm uses the remaining time until the boundary condition is applied, with a counter that resets when the boundary condition is used.
In this idealized case, the counter starts with a value of 10800 (equivalent to 3 hours), but when the supposed boundary condition should occur, it doesn’t, and the counter continues decreasing into negative values, causing the simulation to terminate prematurely.
I wrote a small patch to fix this, but the module should be reconsidered. It uses several variables from the namelist.input without properly checking them.
@JeroBnd Did you encounter a problem associated with dt=0? If so, what version of the code is it?
@weiwangncar I am doing one-way nesting using Ndown and adaptive time step. The boundary conditions (BC) are taken from domain 1 to domain 2 every 30 minutes (1800 seconds).
However, when running WRF for domain 2 with a mistake in the namelist.input file, specifically with the variable interval_seconds set to 3600 seconds, one random run out of the 24-member ensemble usually fails with a CFL error and a segmentation fault or stops at NOAH MP.
For this to happen, two conditions must occur simultaneously:
1 - The time interval between the current simulation time and the next BC time is an odd value (resulting a running time precision of 1/200 sec).
2 - The BC time does not match the BC time derived from interval_seconds, which in my case should occur at times ending in 30 minutes, such as 1:30.
When both of these conditions occur, the mitigation algorithm for dt=0 (#154) sets the dtInterval to match the BC time derived from interval_seconds, generating a time interval of 30 minutes, which crashes the run.
@brianreen We wonder if you could help review this PR? Thanks.
Yes, I will help review this PR.
@JeroBnd
Does this issue only occur when interval_seconds is set incorrectly?
Since I did not think #154 changed code that sets dtInterval, could you clarify what lines of code you are referring to when you say "the mitigation algorithm for dt=0 (#154)"?
Fixed the source of the error dt=0 in BC and out time steps adjust.
TYPE: bug fix
KEYWORDS: time, step, adaptative
SOURCE: Jeronimo Bande (IDING SAS)
DESCRIPTION OF CHANGES: Problem: When adjusting the time step in BC and out timesame time produce dt =0
Solution: What was down algorithmically and in the source code to address the problem?
ISSUE: Fixes #1560
LIST OF MODIFIED FILES: /dyn_em/adapt_timestep_em.F
TESTS CONDUCTED:
RELEASE NOTE: Corrected adaptative time step on BC and OUT time.