wrf-model / WRF

The official repository for the Weather Research and Forecasting (WRF) model
Other
1.25k stars 689 forks source link

WRF_v4.3 out-of-the-box doesn't compile in vortex-following mode #1521

Closed JS-WRF-SBM closed 3 years ago

JS-WRF-SBM commented 3 years ago

Describe the bug The vortex-following compilation mode doesn't compile using Intel 2019_u3 Intelu3_configure_wrf.txt Intelu3_log_cmp_153.txt

To Reproduce Steps to reproduce the behavior:

  1. Use compiler and version: 'Intel, 2019.3.199' (see the attached configure.wrf)
  2. Download WRF_v4.3 out-of-the-box code
  3. Output is: see the attached output log_cmp_153

Expected behavior The code is expected to compile using option: 15 (Intel dampar) + 3 (vortex following)

Screenshots See the attached output log

Attachments See the attached output log

Additional context No PNetCDF/HDF5 libraries were used/linked. The code tries to compile with the NETCDF_classic = 1 option

davegill commented 3 years ago

Part of the regression testing is a moving nest. We'll check it out.

JS-WRF-SBM commented 3 years ago

An update:

On a different computer, with Intel 2018_u5:

  1. Using PNetCDF/HDF5 libraries, but limit the WRF configuration to use NETCDF_classic=1, the code doesn't compile.
  2. Using PNetCDF/HDF5 libraries, with standard WRF configuration using HDF5 compression, the code compiles (see attached).

    I'm wondering if the code can be expected to use the classic mode libs, even if PNetCDF/HDF5 libs are linked ... ?

Intel_2018u5_log_cmp_151.txt Intel_2018u5_configure_wrf.txt

davegill commented 3 years ago

With Intel 19.0.5, I tried building the code with WRF v4.3 code for a moving nest, using NETCDF4 + HDF5, but forcing the classic netcdf option to be used. The job failed similarly to yours.

davegill commented 3 years ago

When I tried removing the removing the netcdf classic option, still a failure. When I tried no moving nest, then it seemed to be able to build.

JS-WRF-SBM commented 3 years ago

When I tried removing the removing the netcdf classic option, still a failure. When I tried no moving nest, then it seemed to be able to build.

Thanks for double check! Indeed, option 15+1 seems to build OK.

davegill commented 3 years ago

@weiwangncar @dudhia Folks, Intel moving nest is broken. The moving nest is tested in the regression suite, so it works with GNU.

The PR is #1391 "height-level diagnostic fixes: add p_zl, use ln(p) interp for pressure"

The hash is 3a2e4e81139f2e8c

Here is the error, immediately after the registry:

/bin/sh: line 1: 62601 Segmentation fault      (core dumped) tools/registry -DEM_CORE=1 -DNMM_CORE=0 -DNMM_MAX_DIM=2600 -DDA_CORE=0 -DWRFPLUS=0 -DIWORDSIZE=4 -DDWORDSIZE=8 -DRWORDSIZE=4 -DLWORDSIZE=4 -DNONSTANDARD_SYSTEM_FUNC -DWRF_USE_CLM -DUSE_NETCDF4_FEATURES -DWRFIO_NCD_LARGE_FILE_SUPPORT -DRPC_TYPES=1 -DDM_PARALLEL -DNETCDF -DLANDREAD_STUB=1 -DMOVE_NESTS -DVORTEX_CENTER -DUSE_ALLOCATABLES -Dwrfmodel -DGRIB1 -DINTIO -DKEEP_INT_AROUND -DLIMIT_ARGS -DBUILD_RRTMG_FAST=0 -DBUILD_RRTMK=0 -DBUILD_SBM_FAST=1 -DSHOW_ALL_VARS_USED=0 -DCONFIG_BUF_LEN=65536 -DMAX_DOMAINS_F=21 -DMAX_HISTORY=25 -DNMM_NEST=0 -DNEW_BDYS Registry/Registry
Makefile:167: recipe for target 'module_state_description.F' failed

Here's the diffs that cause the problem. I do not see any reason to cause a build error.

diff --git a/Registry/registry.diags b/Registry/registry.diags
index 573b2b4f..83c63e4b 100644
--- a/Registry/registry.diags
+++ b/Registry/registry.diags
@@ -69,8 +69,9 @@ state    real   ght_zl i{nz}j   misc    1  Z   h{22}  "GHT_ZL"  "Height level da
 state    real   s_zl   i{nz}j   misc    1  Z   h{22}  "S_ZL"    "Height level data, Speed"                 "m s-1"
 state    real   td_zl  i{nz}j   misc    1  Z   h{22}  "TD_ZL"   "Height level data, Dew point temperature" "K"
 state    real   q_zl   i{nz}j   misc    1  Z   h{22}  "Q_ZL"    "Height level data, Mixing ratio"          "kg/kg"
+state    real   p_zl   i{nz}j   misc    1  Z   h{22}  "P_ZL"    "Height level data, Air Pressure"          "Pa"

 #  Package declarations

 package   skip_z_diags      z_lev_diags==0     -        -
-package        z_diags      z_lev_diags==1     -        state:z_zl,u_zl,v_zl,t_zl,rh_zl,ght_zl,s_zl,td_zl,q_zl
+package        z_diags      z_lev_diags==1     -        state:z_zl,u_zl,v_zl,t_zl,rh_zl,ght_zl,s_zl,td_zl,q_zl,p_zl
diff --git a/dyn_em/start_em.F b/dyn_em/start_em.F
index 08f6f39b..5178797e 100644
--- a/dyn_em/start_em.F
+++ b/dyn_em/start_em.F
@@ -2053,6 +2053,7 @@ DEALLOCATE(z_at_q)
                  ,s_zl  = grid%s_zl             &  
                  ,td_zl = grid%td_zl            &
                  ,q_zl = grid%q_zl              &
+                 ,p_zl = grid%p_zl              &
                  !  Dimension arguments
                  ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde    &
                  ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme    &
diff --git a/phys/module_diag_zld.F b/phys/module_diag_zld.F
index f4cb94fa..6156ecbb 100644
--- a/phys/module_diag_zld.F
+++ b/phys/module_diag_zld.F
@@ -17,7 +17,7 @@ CONTAINS
                     use_tot_or_hyd_p,extrap_below_grnd,missing,     &  
                     num_z_levels,max_z_levels,z_levels,             &
                     z_zl,u_zl,v_zl,t_zl,rh_zl,ght_zl,s_zl,td_zl,    &
-                    q_zl,                                           &
+                    q_zl,p_zl,                                      &
                     ids,ide, jds,jde, kds,kde,                      &
                     ims,ime, jms,jme, kms,kme,                      &
                     its,ite, jts,jte, kts,kte                       )
@@ -44,7 +44,7 @@ CONTAINS
       !  Output variables

       REAL   , INTENT(  OUT) ,  DIMENSION(num_z_levels)                     :: z_zl
-      REAL   , INTENT(  OUT) ,  DIMENSION(ims:ime , num_z_levels , jms:jme) :: u_zl,v_zl,t_zl,rh_zl,ght_zl,s_zl,td_zl,q_zl
+      REAL   , INTENT(  OUT) ,  DIMENSION(ims:ime , num_z_levels , jms:jme) :: u_zl,v_zl,t_zl,rh_zl,ght_zl,s_zl,td_zl,q_zl,p_zl

       !  Local variables

@@ -82,6 +82,7 @@ CONTAINS
                ght_zl(i,kz,j) = missing
                s_zl  (i,kz,j) = missing
                td_zl (i,kz,j) = missing
+               p_zl  (i,kz,j) = missing
             END DO
          END DO
       END DO
@@ -116,11 +117,16 @@ CONTAINS

                      pu = pp(i,ke+1,j)+pb(i,ke+1,j) 
                      pd = pp(i,ke  ,j)+pb(i,ke  ,j)
-                     pm = ( pu * (zm-zd) + pd * (zu-zm) ) / (zu-zd)

                      !  Found trapping height: up, middle, down.
                      !  We are doing first order interpolation.  
                      !  Now we just put in a list of diagnostics for this level.
+
+                     !  0. Pressure (Pa)
+                     !  Note that it is ln(p) that varies linearly with height, not p itself
+                     
+                     pm = exp( ( log(pu) * (zm-zd) + log(pd) * (zu-zm) ) / (zu-zd) )
+                     p_zl(i,kz,j) = pm

                      !  1. Temperature (K)

diff --git a/phys/module_diagnostics_driver.F b/phys/module_diagnostics_driver.F
index 9b45e20d..d70f8456 100644
--- a/phys/module_diagnostics_driver.F
+++ b/phys/module_diagnostics_driver.F
@@ -911,6 +911,7 @@ CONTAINS
                       ,s_zl  = grid%s_zl                                    &
                       ,td_zl = grid%td_zl                                   &
                       ,q_zl = grid%q_zl                                     &
+                      ,p_zl = grid%p_zl                                     &
                !  Dimension arguments
                       ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde    &
                       ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme    &
JS-WRF-SBM commented 3 years ago

@davegill Hi Dave, any update on this bug?

davegill commented 3 years ago

@JS-WRF-SBM @weiwangncar @dudhia Folks, I have a work-around, but still no clean solution.

For the work-around, if I move the include registry.diags to be at the end of the list in the Registry/registry.em_shared_collection file, then the code compiles with Intel with the moving nest option.

diff --git a/Registry/registry.em_shared_collection b/Registry/registry.em_shared_collection
index aa9c318f..3a633e8b 100644
--- a/Registry/registry.em_shared_collection
+++ b/Registry/registry.em_shared_collection
@@ -19,7 +19,6 @@ include registry.ssib
 include registry.noahmp
 include registry.sbm
 include registry.polrad
-include registry.diags
 include registry.afwa
 include registry.rasm_diag
 include registry.elec
@@ -28,3 +27,4 @@ include registry.hyb_coord
 include registry.new3d_wif
 include registry.trad_fields
 include registry.solar_fields
+include registry.diags

It is important to note that there is NOTHING wrong with the PR that causes the troubles! A number of modifications have been tried to isolate the trouble. Mods that do not make any difference:

  1. Changing the compiler optimization levels.
  2. Changing versions of the Intel compiler.
  3. Removing packaging for the offending variables.
  4. Re-ordering the variables in the height-level diags registry.
  5. Cleaning up the registry warnings (cupflag and lake2d are logicals, and cannot be shifted).
  6. Changing the name of potential_t to something shorter.

Status of other attempts:

  1. If I cherry pick the PR that breaks the code (SHA 3a2e4e811), and add it to earlier releases, that modified code fails to build.
  2. If I remove the code that constitutes PR #1391 from the top of the repo, then the code successfully builds.
  3. Using the gdb utility, there seems to be a memory problem. The error is reproducible, but moves around when printf commands are introduced in the registry program.
weiwangncar commented 3 years ago

Note that Jacob recently said that the fix didn't help him.

JS-WRF-SBM commented 3 years ago

Note that Jacob recently said that the fix didn't help him.

Hi @weiwangncar Thanks for pushing forward on this, but let me be more clear. There were 2 problems which apparently weren't related (but could be related on broader aspect):

Hope it's more clear now. Jacob