ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

CHGRES error to process GFS data #6

Closed uturuncoglu closed 4 years ago

uturuncoglu commented 4 years ago

The CHGRES packed with NCEPLIBS (github.com:NOAA-EMC/UFS_UTILS.git, @8b8db58, Nov 11 19:08:52 2019) triggers error when it tries to process GFS data.

 - CALL FieldScatter FOR INPUT GRID TEMPERATURE.
 - FATAL ERROR: READING TEMPERATURE RECORD.
 - IOSTAT IS:          -31
MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).
        Process ID: 43767, Host: r12i0n32, Program: /glade/scratch/turuncu/ufs-mrweather-app-workflow/bld/chgres_cube.exe
        MPT Version: HPE MPT 2.19  02/23/19 05:30:09

and the error trace is

MPT:     rc=<error reading variable: Cannot access memory at address 0x0>,
MPT:     .tmp.STRING.len_V$7=14336928)
MPT:     at /glade/work/turuncu/UFS/NCEP_LIBS_ALL/UFS_UTILS/sorc/chgres_cube.fd/utils.f90:11
MPT: #8  0x00000000004590fa in input_data::read_input_atm_gaussian_file (localpet=0)
MPT:     at /glade/work/turuncu/UFS/NCEP_LIBS_ALL/UFS_UTILS/sorc/chgres_cube.fd/input_data.F90:1254
MPT: #9  0x0000000000435a5f in input_data::read_input_atm_data (localpet=0)
MPT:     at /glade/work/turuncu/UFS/NCEP_LIBS_ALL/UFS_UTILS/sorc/chgres_cube.fd/input_data.F90:147
MPT: #10 0x000000000041092a in atmosphere::atmosphere_driver (localpet=0)
MPT:     at /glade/work/turuncu/UFS/NCEP_LIBS_ALL/UFS_UTILS/sorc/chgres_cube.fd/atmosphere.F90:145
MPT: #11 0x0000000000435839 in chgres ()
MPT:     at /glade/work/turuncu/UFS/NCEP_LIBS_ALL/UFS_UTILS/sorc/chgres_cube.fd/chgres.F90:78

The same data and the namelist can be processed with the external CHGRES installation used in the prototype system. This version has last commit from George Gayno on Fri Sep 6 10:29:16 2019 and hash is @947145c.

I also test to install NCEP LIBS with Intel 18.0.5 and system provided NetCDF (4.7.1) but it does not help.

uturuncoglu commented 4 years ago

The top level hash for NCEPLIBS is e9131cc in this case

arunchawla-NOAA commented 4 years ago

@climbfuji and @mark-a-potts this seems to be an issue with wrong nceplibs being ported. Can you confirm that when nceplibs super project is used to build on Cheyyenne that chgres and ncep post still works ?

arunchawla-NOAA commented 4 years ago

@uturuncoglu can you provide the files that you are using on Cheyyenne to @climbfuji and @mark-a-potts

GeorgeGayno-NOAA commented 4 years ago

I was able to reproduce the error on Hera using _chgrescube from the spack-build branch at commit ea4bf9c. When I swap out the nemsio library associated with the branch with the 'official' Hera version, _chgrescube runs without error. So the nemsio library is the likely culprit.

uturuncoglu commented 4 years ago

This is my config file,

&config atm_files_input_grid = "gfs.t00z.atmanl.nemsio" convert_atm = .true. convert_nst = .true. convert_sfc = .true. cycle_day = 9 cycle_hour = 0 cycle_mon = 9 data_dir_input_grid = "/glade/scratch/turuncu/fv3gfs/chgres/gfs.20190909" fix_dir_target_grid = "/glade/scratch/turuncu/fv3gfs/chgres/fix_sfc" input_type = "gaussian" mosaic_file_target_grid = "INPUT/C96_mosaic.nc" orog_dir_target_grid = "INPUT" orog_files_target_grid = "oro_data.tile1.nc", "oro_data.tile2.nc", "oro_data.tile3.nc", "oro_data.tile4.nc", "oro_data.tile5.nc", "oro_data.tile6.nc" sfc_files_input_grid = "gfs.t00z.sfcanl.nemsio" tracers = "sphum", "liq_wat", "o3mr", "ice_wat", "rainwat", "snowwat", "graupel" tracers_input = "spfh", "clwmr", "o3mr", "icmr", "rwmr", "snmr", "grle" vcoord_file_target_grid = "/glade/scratch/turuncu/fv3gfs/chgres/global_hyblev.l65.txt" /

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA That was fast. It is good to know that you could reproduce the error. I have also problem with NCEP Post and that could be also related with this. What do you think? I'll create another issue related with that but maybe it is better to wait until this issue is fixed.

GeorgeGayno-NOAA commented 4 years ago

@GeorgeGayno-NOAA That was fast. It is good to know that you could reproduce the error. I have also problem with NCEP Post and that could be also related with this. What do you think? I'll create another issue related with that but maybe it is better to wait until this issue is fixed.

I don't know anything about NCEP post.

uturuncoglu commented 4 years ago

It is still failing when i try to process the data with chgres build with NCEPLIBS that points the ufs_release_v1.0

mark-a-potts commented 4 years ago

I updated NCEPLIBS-nemsio today to something that will hopefully fix this problem. Did you pull the latest version (with submodules) of ufs_release_v1.0 before you tested today?

arunchawla-NOAA commented 4 years ago

Ufuk

If the library has already been built then this is probably not building again. Can you purge all your nceplibs builds and ensure that everything gets built again?

Sent from my iPhone

On Dec 16, 2019, at 7:03 PM, Ufuk Turunçoğlu notifications@github.com wrote:

It is still failing when i try to process the data with chgres build with NCEPLIBS that points the ufs_release_v1.0

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.

uturuncoglu commented 4 years ago

@mark-a-potts yes, I updated the code. I am checking out origin/full-stack branch and the hash that I used is d2fab36

@arunchawla-NOAA I did a fresh install in this case

@GeorgeGayno-NOAA was able to reproduce the issue. @GeorgeGayno-NOAA did you test it with the latest version? Is it working in your case.

Hang-Lei-NOAA commented 4 years ago

I further match the script and code of ufs_release_v1.0. It has fully match the official version installed on WCOSS and Hera. I tested with cmake building. May use this https://github.com/NOAA-EMC/NCEPLIBS-nemsio/tree/ufs_release_v1.0 to solve the problem.

uturuncoglu commented 4 years ago

I am installing the NCEPLIBS with following commands

# clone NCEPLIBS
$ git clone https://github.com/NOAA-EMC/NCEPLIBS.git NCEP_LIBS_ALL

# checkout full-stack branch and update
$ cd NCEP_LIBS_ALL 
$ git checkout origin/full-stack
$ git submodule init
$ git submodule sync
$ git submodule update --recursive
$ git submodule foreach git submodule init
$ git submodule foreach git submodule sync
$ git submodule foreach git submodule update

# load modules used in model
# on Cheyenne:
$ module purge
$ module load ncarenv/1.2 intel/19.0.2 mkl mpt/2.19 ncarcompilers/0.5.0
$ module load cmake
$ module use /glade/work/turuncu/PROGS/modulefiles/esmfpkgs/intel/19.0.2
$ module load esmf-8.0.0-ncdfio-mpt-O

# define required environment variables
# on Cheyenne:
$ export ESMF_LIB=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf_fullylinked.so
$ export ESMF_INC=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default

# create installation directory
$ mkdir build-all
$ cd build-all

# build libraries
$ cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..
$ make -j 20

@mark-a-potts @GeorgeGayno-NOAA do you think that by this way i could install libraries to solve the CHGRES issue? I'll wait for your response before trying again.

arunchawla-NOAA commented 4 years ago

@GeorgeGayno-NOAA commented to me that this works for him. George can you confirm ?

Hang-Lei-NOAA commented 4 years ago

You have to checkout the branch ufs_release-v1.0 to get the update.

Or you have to change the nemsio library to the source on https://github.com/NOAA-EMC/NCEPLIBS-nemsio/tree/ufs_release_v1.0

In the develop, it still point to the old one. I do not have to right to change this repo. I will ask Mark to give me the write to make this change for develop.

On Wed, Dec 18, 2019 at 1:14 PM Ufuk Turunçoğlu notifications@github.com wrote:

I am installing the NCEPLIBS with following commands

clone NCEPLIBS

$ git clone https://github.com/NOAA-EMC/NCEPLIBS.git NCEP_LIBS_ALL

checkout full-stack branch and update

$ cd NCEP_LIBS_ALL $ git checkout origin/full-stack $ git submodule init $ git submodule sync $ git submodule update --recursive $ git submodule foreach git submodule init $ git submodule foreach git submodule sync $ git submodule foreach git submodule update

load modules used in model

on Cheyenne:

$ module purge $ module load ncarenv/1.2 intel/19.0.2 mkl mpt/2.19 ncarcompilers/0.5.0 $ module load cmake $ module use /glade/work/turuncu/PROGS/modulefiles/esmfpkgs/intel/19.0.2 $ module load esmf-8.0.0-ncdfio-mpt-O

define required environment variables

on Cheyenne:

$ export ESMF_LIB=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf_fullylinked.so $ export ESMF_INC=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default

create installation directory

$ mkdir build-all $ cd build-all

build libraries

$ cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install .. $ make -j 20

@mark-a-potts https://github.com/mark-a-potts @GeorgeGayno-NOAA https://github.com/GeorgeGayno-NOAA do you think that by this way i could install libraries to solve the CHGRES issue? I'll wait for your response before trying again.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/6?email_source=notifications&email_token=AKWSMFGHROOZLUJEX3G6DXDQZJSANA5CNFSM4JROKQR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHG77ZY#issuecomment-567148519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFETOJZENXUGZJUKTHLQZJSANANCNFSM4JROKQRQ .

uturuncoglu commented 4 years ago

@Hang-Lei-NOAA Thanks for the information. That is great to know that it works for @GeorgeGayno-NOAA. So, i'll try and let you know about it.

GeorgeGayno-NOAA commented 4 years ago

@GeorgeGayno-NOAA commented to me that this works for him. George can you confirm ?

I tried Mark's branch (b76f56a) on Hera using my canned case. It works.

uturuncoglu commented 4 years ago

I am getting following error when i try to use ufs_release_v1.0 branch

-- The C compiler identification is Intel 19.0.0.20190117
-- The CXX compiler identification is Intel 19.0.0.20190117
-- Check for working C compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icc
-- Check for working C compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icpc
-- Check for working CXX compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icpc -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The Fortran compiler identification is Intel 19.0.0.20190117
-- Check for working Fortran compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort
-- Check for working Fortran compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort supports Fortran 90
-- Checking whether /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort supports Fortran 90 -- yes
-- Found OpenMP_C: -qopenmp (found version "5.0") 
-- Found OpenMP_CXX: -qopenmp (found version "5.0") 
-- Found OpenMP_Fortran: -qopenmp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
HEY esmf inc is /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default
-- Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS HDF5_HL_LIBRARIES C HL) (found version "")
HEY netcdf_dir is 
CMake Error at cmake/Modules/FindNetCDF.cmake:91 (message):

           Cannot find NETCDF!!!!

Call Stack (most recent call first):
  CMakeLists.txt:115 (find_package)

-- Configuring incomplete, errors occurred!
See also "/glade/work/turuncu/UFS/NCEP_LIBS_ALL.dec18/build-all/CMakeFiles/CMakeOutput.log".

I could provide an external HDF5 and NETCDF modules but this was working before and you might want to look at the issue.

mark-a-potts commented 4 years ago

Hi Ufuk,

What modules did you have loaded when you ran the cmake command, and what was that command?

-M

On 12/18/19 1:58 PM, Ufuk Turunçoğlu wrote:

I am getting following error when i try to use ufs_release_v1.0 branch

-- The C compiler identification is Intel 19.0.0.20190117 -- The CXX compiler identification is Intel 19.0.0.20190117 -- Check for working C compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icc -- Check for working C compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icpc -- Check for working CXX compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/icpc -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- The Fortran compiler identification is Intel 19.0.0.20190117 -- Check for working Fortran compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort -- Check for working Fortran compiler: /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort -- works -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Checking whether /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort supports Fortran 90 -- Checking whether /glade/u/apps/ch/opt/ncarcompilers/0.5.0/intel/19.0.2/ifort supports Fortran 90 -- yes -- Found OpenMP_C: -qopenmp (found version "5.0") -- Found OpenMP_CXX: -qopenmp (found version "5.0") -- Found OpenMP_Fortran: -qopenmp (found version "5.0") -- Found OpenMP: TRUE (found version "5.0") HEY esmf inc is /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default -- Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS HDF5_HL_LIBRARIES C HL) (found version "") HEY netcdf_dir is CMake Error at cmake/Modules/FindNetCDF.cmake:91 (message): Cannot find NETCDF!!!! Call Stack (most recent call first): CMakeLists.txt:115 (find_package) -- Configuring incomplete, errors occurred! See also "/glade/work/turuncu/UFS/NCEP_LIBS_ALL.dec18/build-all/CMakeFiles/CMakeOutput.log".

I could provide an external HDF5 and NETCDF modules but this was working before and you might want to look at the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/6?email_source=notifications&email_token=AH4Q2URHF3TGQCX4ZLPKWSDQZJXEFA5CNFSM4JROKQR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHHD53A#issuecomment-567164652, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4Q2UTQ55AVILMFLMIVK6LQZJXEFANCNFSM4JROKQRQ.

-- Mark A. Potts, Ph.D. Sr. HPC Software Developer RedLine Performance Solutions, LLC Phone 202-744-9469 Mark.Potts@noaa.gov mpotts@redlineperf.com

uturuncoglu commented 4 years ago

I used following commands to load the modules

module purge
module load ncarenv/1.2 intel/19.0.2 mkl mpt/2.19 ncarcompilers/0.5.0
module load cmake
module use /glade/work/turuncu/PROGS/modulefiles/esmfpkgs/intel/19.0.2
module load esmf-8.0.0-ncdfio-mpt-O

to use external ESMF installation

export ESMF_LIB=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/libesmf_fullylinked.so
export ESMF_INC=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default

and use following cmake command to install,

mkdir build-all
cd build-all
cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..
mark-a-potts commented 4 years ago

I think this was happening because the cmake submodule was not pointing to the latest version. I have updated it in the ufs_release_v1.0 branch of NCEPLIBS, which is the branch we will be using for the release and it builds for me with the modules and environment variables you were using.

-Mark

On 12/18/19 4:02 PM, Ufuk Turunçoğlu wrote:

|cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..|

-- Mark A. Potts, Ph.D. Sr. HPC Software Developer RedLine Performance Solutions, LLC Phone 202-744-9469 Mark.Potts@noaa.gov mpotts@redlineperf.com

uturuncoglu commented 4 years ago

@mark-a-potts i could able to install without any problem. I'll test chgres with this version and get back to you soon.

uturuncoglu commented 4 years ago

@mark-a-potts @GeorgeGayno-NOAA If i use https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_fv3_gmted2010.v20191213/C96/C96_mosaic .nc as a input to CHGRES, i am getting following error

 - FATAL ERROR: CANT DETERMINE CRES FROM MOSAIC FILE.
 - IOSTAT IS:            1

Here is my namelist,

&config
  atm_files_input_grid = "gfs.t00z.atmanl.nemsio"
  convert_atm = .true.
  convert_nst = .true.
  convert_sfc = .true.
  cycle_day = 9
  cycle_hour = 0
  cycle_mon = 9
  data_dir_input_grid = "/glade/scratch/turuncu/fv3gfs/chgres/gfs.20190909"
  fix_dir_target_grid = "/glade/scratch/turuncu/fv3gfs/chgres/fix_sfc"
  input_type = "gaussian"
  mosaic_file_target_grid = "INPUT/grid_spec.nc"
  orog_dir_target_grid = "INPUT"
  orog_files_target_grid = "oro_data.tile1.nc", "oro_data.tile2.nc",
      "oro_data.tile3.nc", "oro_data.tile4.nc", "oro_data.tile5.nc",
      "oro_data.tile6.nc"
  sfc_files_input_grid = "gfs.t00z.sfcanl.nemsio"
  tracers = "sphum", "liq_wat", "o3mr", "ice_wat", "rainwat", "snowwat",
      "graupel"
  tracers_input = "spfh", "clwmr", "o3mr", "icmr", "rwmr", "snmr", "grle"
  vcoord_file_target_grid = "/glade/scratch/turuncu/fv3gfs/chgres/global_hyblev.l65.txt"
/
uturuncoglu commented 4 years ago

In this case i am linking C96_mosaic.nc as grid_spec.nc in the INPUT directory

GeorgeGayno-NOAA commented 4 years ago

In this case i am linking C96_mosaic.nc as grid_spec.nc in the INPUT directory

The code assumes the name of the mosaic file is: CXX_mosaic.nc

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA Thanks. Okay i'll try to test it with that way but the mosaic file is defined in the mosaic_file_target_grid namelist option and i think that there is no ned to have same file in the INPUT directory with different name (C96_mosaic.nc and grid_spec.nc)

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA @arunchawla-NOAA The chgres is working without any problem but we need to find a way to maintain its required input files on FTP.

https://github.com/ufs-community/ufs-mrweather-app/issues/15#issuecomment-568113841