wrf-model / WRF

The official repository for the Weather Research and Forecasting (WRF) model
Other
1.27k stars 697 forks source link

Observation Nudging segfaults #1278

Open Plantain opened 4 years ago

Plantain commented 4 years ago

I'm debugging some very strange crashes at the end of initialization / first timestep with WRF when observation nudging is enabled on a nest, and I think I've crossed the threshold where I'm now confident there's a bug and not a configuration issue. It seems sensitive to both MPI processes, numtiles and openmp, which points to some kind of tiling/communication/MPI issue.

MPI processes = 1, numtiles = 2: Success MPI processes = 2, numtiles = 1: Success MPI processes = 2, numtiles = 2: Crash

Building with a debug build points to a crash coming from module_dm.f90:5527 DO N = 1, NSTA ERRF(1,IFULL_BUFFER(N)) = FULL_BUFFER(N) END DO

Running with a debugger, when nsta = 1034, both IFULL_BUFFER and FULL_BUFFER seem to only be populated to 27 values with subsequent values appearing like uninitialized memory, so when N >= 28 it crashes on the first read of a value > 1034.

This is all with a dm build of WRF. Building with dm+sm the crashes do not occur. Running only with nudging the outer domain works fine.

I've reached the limit of my understanding of this code and MPI in WRF, so I'm making this issue in hopes someone else can pick it up, or provide some ideas so I can take this further. I can supply a full set of input data, but I think just the OBS_DOMAIN*01 files and obs_nudge_opt = 1,1 on any nested domain should do.

I seem to not be the only person who's had similar problems, i.e. https://forum.mmm.ucar.edu/phpBB3/viewtopic.php?f=45&t=5391

obs.zip logs.zip

davegill commented 4 years ago

@Plantain What is the value of your namelist variable that is used to allocated space for those arrays? The default looks to be about 150000.

&fdda
 max_obs = ???
Plantain commented 4 years ago
  1. It doesn't appear to be overflowing that.

On Sun, 30 Aug 2020, 9:51 pm Dave Gill, notifications@github.com wrote:

@Plantain https://github.com/Plantain What is the value of your namelist variable that is used to allocated space for those arrays? The default looks to be about 150000.

&fdda max_obs = ???

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wrf-model/WRF/issues/1278#issuecomment-683462709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACODF2QVMYWCGV5UM353SLSDKUUTANCNFSM4QPYWEVA .

Plantain commented 4 years ago

&fdda obs_nudge_opt = 1,1 max_obs = 100000 fdda_start = 0,0 fdda_end = 360,360 obs_twindo = 0.5,0.5 obs_ionf = 10 / ~
~