wrf-model / WRF

The official repository for the Weather Research and Forecasting (WRF) model
Other
1.19k stars 663 forks source link

Segmentation fault using random_perturb in V4.3 #1545

Open pedro-jm opened 2 years ago

pedro-jm commented 2 years ago

Describe the bug Segmentation fault very early in a WRF simulation when the option random_perturb = 1 is selected.

To Reproduce Steps to reproduce the behavior:

  1. Use compiler and version gnu/9.1.0 configure with option -D
  2. Use namelist option random_perturb = 1
  3. Output is a segmentation fault in rsl.error.0000 resulting from the call to subroutine in module_stoch.F: CALL UPDATE_STOCH( & SPFORCS,SPFORCC,SP_AMP,ALPH_RAND, & restart,iseedarr,seed_dim, & ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & grid%i_start(ij), grid%i_end(ij), grid%j_start(ij), grid%j_end(ij), kts, kte

Then input values grid%[i,j]_[start, end] are unrealistic. The grid%num_tiles has a very large value. Probably if these values are fixed the code would work again.

Additional context

! Initialize RAND_PERTURB (4) IF (grid%rand_perturb_on==1) then if ((.not.config_flags%restart) .or. (.not.config_flags%hrrr_cycling)) then call rand_seed (config_flags, grid%ISEED_RAND_PERT, grid%iseedarr_rand_pert , 1, config_flags%seed_dim) endif call SETUP_RAND_PERTURB('R', & grid%rand_pert_vertstruc,config_flags%restart, & grid%SP_AMP, & grid%SPFORCC,grid%SPFORCS,grid%ALPH_RAND, & grid%VERTSTRUCC,grid%VERTSTRUCS,grid%VERTAMPT, & grid%KMINFORCT,grid%KMAXFORCT, & grid%LMINFORCT,grid%LMAXFORCT, & grid%KMAXFORCTH,grid%LMAXFORCTH, & grid%time_step,grid%DX,grid%DY, & grid%gridpt_stddev_rand_pert, & grid%lengthscale_rand_pert, & grid%timescale_rand_pert, & grid%TOT_BACKSCAT_PSI,grid%ZTAU_PSI, & grid%REXPONENT_PSI, & ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & its, ite, jts, jte, kts, kte )

! if (.not.config_flags%restart) then ! spin up ! do k = 1,10 ! CALL RAND_PERT_UPDATE(grid,'R', & ! grid%SPFORCS,grid%SPFORCC, & ! grid%SP_AMP,grid%ALPH_RAND, & ! ips, ipe, jps, jpe, kps, kpe, & ! ids, ide, jds, jde, kds, kde, & ! ims, ime, jms, jme, kms, kme, & ! kts, kte, & ! imsx,imex,jmsx,jmex,kmsx,kmex, & ! ipsx,ipex,jpsx,jpex,kpsx,kpex, & ! imsy,imey,jmsy,jmey,kmsy,kmey, & ! ipsy,ipey,jpsy,jpey,kpsy,kpey, & ! grid%num_stoch_levels,grid%num_stoch_levels, & ! grid%num_stoch_levels,grid%num_stoch_levels, & ! config_flags%restart, grid%iseedarr_rand_pert, & ! config_flags%seed_dim, & ! grid%DX,grid%DY,grid%rand_pert_vertstruc, & ! grid%RAND_PERT, & ! grid%gridpt_stddev_rand_pert, & ! grid%lengthscale_rand_pert, & ! grid%VERTSTRUCC,grid%VERTSTRUCS,grid%VERTAMPT ) ! enddo ! ENDIF !rand_perturb_on ENDIF

smileMchen commented 2 years ago

@dudhia @davegill @pedro-jm

Jimy, Dave, Pedro, Dave and I run the debug option of WRF with rand_perturb being turned on. We looked at various variables inside dyn_em/module_stoch.F and found that the model crashed because some variables are divided by zero. Below is the details.

In the following piece of code:

577       ZGAMMAN=4.0*ZGAMMAN !account for all quadrants, although only one is Filled
 578       if (variable == 'W') then
 579          ZCONSTF0=SQRT(ALPH*TOT_BACKSCAT/(float(itime_step)*ZSIGMA2*ZGAMMAN))/(2*RPI)
 580       elseif  (variable == 'T') then
 581          ZCONSTF0=SQRT(T0*ALPH*TOT_BACKSCAT/(float(itime_step)*cp*ZSIGMA2*ZGAMMAN))
 582       elseif ((variable == 'P') .or. (variable == 'R') .or.  (variable == 'S') .or. (variable == 'O') .or.  (var     iable == 'Q     ')) then
 585          ZCONSTF0= gridpt_stddev_rand_perturb*sqrt((1.-phi**2)/(2.*ZGAMMAN))
 586       endif

ZGAMMAN = 0, which leads to model crash

Tracing back to the calculation of ZGAMMAN, we have the codes:

 560         if ((IK>0).or.(IL>0)) then
 561           if (variable == 'W') then
 562             ZCHI(IL+1,IK+1)=((IK/RY*IK/RY)+(IL/RX*IL/RX))**(REXPONENT/2.)  ! SKEBS :U
 563             ZGAMMAN= ZGAMMAN + ((IK/RY*IK/RY)+(IL/RX*IL/RX))**(REXPONENT+1)
 564           else if (variable == 'T') then
 565             ZCHI(IL+1,IK+1)=((IK/RY*IK/RY)+(IL/RX*IL/RX))**(REXPONENT/2.)  ! SKEBS :T
 566             ZGAMMAN= ZGAMMAN + ((IK/RY*IK/RY)+(IL/RX*IL/RX))**(REXPONENT)
 567           else if ((variable == 'P') .or. (variable == 'R') .or.  (variable == 'S') .or. (variable == 'O') .or.  (varii
     able == 'Q     ')) then
 568             ZCHI(IL+1,IK+1)=exp(  -2*RPI**2*kappat*((IK/RY*IK/RY)+(IL/RX*IL/RX)) ) !SPPT
 569             ZGAMMAN = ZGAMMAN + exp(  -4*RPI**2*kappat*((IK/RY*IK/RY)+(IL/RX*IL/RX)) ) !SPPT
 571           endif
 572         endif

Where

kappat= L_rand_perturb**2

The large default value (500000) of namelist option "L_rand_perturb" results in extremely large value of kappat, which leads to zero value of ZGAMMAN (calculated in line 569 shown above).

Based on above analysis, we believe that nothing is wrong with the code. The issue is caused by unreasonable value of L_rand_perturb in namelist. When we set L_rand_perturb =10000, the code runs fine.

We need to contact the developer and ask what reasonable value to specify for L_rand_perturb.

davegill commented 2 years ago

@smileMchen Ming, Excellent! May we close this issue, and handle the run-time details via email?

smileMchen commented 2 years ago

Based on new tests and communications with Jimy and Dave:

For a smaller model domain (79 x 147 grids, grid interval =0.45km) (1) When L_rand_perturb =500000.0, RAND_PERT is initially 0 and later remains a constant value of 0.09 This indicates that even if the model is able to run, stochastic perturbation didn't work as expected (2) When L_rand_perturb = 10000.0, RAND_PERT shows reasonable value and the stochastic perturbation works fine.

For a larger model domain (415 x 325, grid interval =15km) (1) When L_rand_perturb =500000.0, the RAND_PERT pattern is: image

(2) When L_rand_perturb =100000.0, the RAND_PERT displays the pattern below: image

Apparently the value of L_rand_perturb should be specified based on domain size.

smileMchen commented 2 years ago

The code works just fine. If everyone agrees, I think we can close this issue.