CHGRES error when processing GRIB2 data

uturuncoglu commented 4 years ago

@ligiabernardet @climbfuji

I am getting error on Cheyenne when i try to process the GRIB2 (0.5 deg.) data,

MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-49.16.x86_64
MPT: (gdb) #0  0x00002b80f16096da in waitpid ()
MPT:    from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: #1  0x00002b80f1d48db6 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7ffd934782c0 "MPT ERROR: Rank 137(g:137) received signal SIGSEGV(11).\n\tProcess ID: 63802, Host: r14i7n26, Program: /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/bin/chgres_cube.exe\n\tMPT Ver"...) at sig.c:340
MPT: #3  0x00002b80f1d48fb2 in first_arriver_handler (signo=signo@entry=11,
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2b80ff940080) at sig.c:489
MPT: #4  0x00002b80f1d4934b in slave_sig_handler (signo=11,
MPT:     siginfo=<optimized out>, extra=<optimized out>) at sig.c:564
MPT: #5  <signal handler called>
MPT: #6  0x00002b80f1c919bb in pmpi_abort__ ()
MPT:    from /glade/u/apps/ch/opt/mpt/2.19/lib/libmpi.so
MPT: #7  0x00000000005337cd in error_handler (string=...,
MPT:     rc=<error reading variable: Cannot access memory at address 0x0>,
MPT:     .tmp.STRING.len_V$7=55327936)
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/utils.f90:11
MPT: #8  0x00000000004ed732 in model_grid::define_input_grid_gfs_grib2 (localpet=0,
MPT:     npets=<error reading variable: Cannot access memory at address 0x0>)
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/model_grid.F90:640
MPT: #9  0x00000000004e924c in model_grid::define_input_grid (localpet=0,
MPT:     npets=<error reading variable: Cannot access memory at address 0x0>)
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/model_grid.F90:124
MPT: #10 0x0000000000469b78 in chgres ()
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/chgres.F90:77
MPT: #11 0x0000000000456aa2 in main ()
MPT: #12 0x00002b80f263f6e5 in __libc_start_main ()
MPT:    from /glade/u/apps/ch/os/lib64/libc.so.6
MPT: #13 0x00000000004569a9 in _start () at ../sysdeps/x86_64/start.S:118
MPT: (gdb) A debugging session is active.
MPT:
MPT:    Inferior 1 [process 63802] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/63802/exe, process 63802

my CHGRES namelist file

&config
  convert_atm = .true.
  convert_sfc = .true.
  cycle_day = 29
  cycle_hour = 0
  cycle_mon = 8
  data_dir_input_grid = "/glade/p/cesmdata/cseg/ufs_inputdata/icfiles/201908/20190829"
  fix_dir_target_grid = "INPUT"
  grib2_file_input_grid = "atm.input.ic.grb2"
  input_type = "grib2"
  mosaic_file_target_grid = "INPUT/C768_mosaic.nc"
  orog_dir_target_grid = "INPUT"
  orog_files_target_grid = "oro_data.tile1.nc", "oro_data.tile2.nc",
      "oro_data.tile3.nc", "oro_data.tile4.nc", "oro_data.tile5.nc",
      "oro_data.tile6.nc"
  varmap_file = "GFSphys_var_map.txt"
  vcoord_file_target_grid = "INPUT/global_hyblev.l65.txt"
/

It seems it is memory issue and we have failure for all C384 (default 144 core for CHGRES) and C768 (default 216 core for CHGRES). Do you have any idea why? It seems that CHGRES memory requirement is higher than the previous release but I am not sure. I could simply update the interface and increase the used resource for those resolution but I am not sure it is good idea or not and if you remember we had similar issue with NetCDF input. Let me know what do you think?

ligiabernardet commented 4 years ago

@GeorgeGayno-NOAA Have there been any changes in chgres_cube from v1.0 release to v1.1 release that would lead it t require more memory when processing GRIB2 data? And do you know if/how the memory requirements to process GRIB2, NEMSIO, and netCDF format differ?

GeorgeGayno-NOAA commented 4 years ago

@GeorgeGayno-NOAA Have there been any changes in chgres_cube from v1.0 release to v1.1 release that would lead it t require more memory when processing GRIB2 data? And do you know if/how the memory requirements to process GRIB2, NEMSIO, and netCDF format differ?

The GRIB2 option should not require more memory than before. I can run a GRIB2 case on only one of our WCOSS nodes. The netCDF option will require the most memory of the three. You may be running with more MPI tasks (144 and 216) than you need. That can sometimes lead to an error. Try running with 24 MPI tasks.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA Let me check by reducing number of processor for chgres. I'll update you soon.

ligiabernardet commented 4 years ago

@GeorgeGayno-NOAA: @uturuncoglu has let us know that he is having trouble finding the right number of processor to run chgres with. Do you have a recommendation for the number of processors that should be used for the various App supported resolutions C96, C192, C384, and C768? And should the number of processors vary depending on the format of the data that chgres is reading in?

GeorgeGayno-NOAA commented 4 years ago

@GeorgeGayno-NOAA: @uturuncoglu has let us know that he is having trouble finding the right number of processor to run chgres with. Do you have a recommendation for the number of processors that should be used for the various App supported resolutions C96, C192, C384, and C768? And should the number of processors vary depending on the format of the data that chgres is reading in?

I don't have access to Cheyenne. But I run some tests on Hera, Jet or Orion. Is Cheyenne similar to those machines?

climbfuji commented 4 years ago

@GeorgeGayno-NOAA: @uturuncoglu has let us know that he is having trouble finding the right number of processor to run chgres with. Do you have a recommendation for the number of processors that should be used for the various App supported resolutions C96, C192, C384, and C768? And should the number of processors vary depending on the format of the data that chgres is reading in?

I don't have access to Cheyenne. But I run some tests on Hera, Jet or Orion. Is Cheyenne similar to those machines?

Cheyenne as 2x18 cores 2.3-GHz Intel Xeon E5-2697V4 (Broadwell) processors, and 64GB of DDR4-2400 memory. I believe hera has more, but NOAA RDHPC docs isnt' working for me at the moment. Here is the Cheyenne documentation: https://www2.cisl.ucar.edu/resources/computational-systems/cheyenne

GeorgeGayno-NOAA commented 4 years ago

I ran a test on xJet, which has 64 GB memory per node. Input was 0.5-degree grib2 data. It was mapped to a C768 global uniform grid with 64 atmos. levels. It ran on two nodes with six tasks per node. I don't know how close I was to the memory limit. To be safe (and to reduce wall clock time a bit), you can try three nodes, six tasks per node. The total number of tasks must be a multiple of six per ESMF requirements. You should not have to run chgres with hundreds of tasks.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA Thanks. That is really helpful. BTW, how many core exist in the each node?

GeorgeGayno-NOAA commented 4 years ago

@GeorgeGayno-NOAA Thanks. That is really helpful. BTW, how many core exist in the each node?

24 cores per node.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA I could process grib2 (0.5 deg.) C768 using 4 nodes with 6 tasks per node. I'll try same configuration with nemsio and netcdf input.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA It seems that same configuration fails with nemsio. I'll try to increase the node.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA I tried to use 6 nodes and 8 nodes with 6 tasks per node but it is still failing. If you don't mind could you test in your side?

panll commented 4 years ago

I used 6 nodes, 24 process each node for C768 on Hera. It worked. @uturuncoglu

uturuncoglu commented 4 years ago

@panll Thanks. I am not sure why it is not working in my case. I have 64 GB memory in each node, which is same with Hera. I'll try to increase the number of cores per node. BTW, I could process NETCDF with 8 nodes with 6 tasks per node combination.

uturuncoglu commented 4 years ago

@panll same configuration (6 nodes, 24 process each node for C768) fails on Cheyenne.

uturuncoglu commented 4 years ago

@panll @GeorgeGayno-NOAA 8 nodes, 24 process each node for C768 also fails. Any suggestion?

GeorgeGayno-NOAA commented 4 years ago

@panll Thanks. I am not sure why it is not working in my case. I have 64 GB memory in each node, which is same with Hera. I'll try to increase the number of cores per node. BTW, I could process NETCDF with 8 nodes with 6 tasks per node combination.

I believe Hera has 96 Gb.

GeorgeGayno-NOAA commented 4 years ago

I will try a nemsio test today.

GeorgeGayno-NOAA commented 4 years ago

I tried a C768 uniform grid, 64 atmos. levels using nemsio data as input. On Jet (with 64 Gb per node), I was able to get it to run using 6 nodes, 6 tasks per node.

ligiabernardet commented 4 years ago

@uturuncoglu Do you think it would be good to try 6 nodes, 6 tasks per node for C768 on Cheyenne?

uturuncoglu commented 4 years ago

@ligiabernardet i was off in the morning, I am checking now.

uturuncoglu commented 4 years ago

@ligiabernardet @GeorgeGayno-NOAA I am still getting error with those configuration. The CHGRES fails with rc=<error reading variable: Cannot access memory at address 0x0>, error.

uturuncoglu commented 4 years ago

i also increased the number of nodes from 6 to 8 but still failing. Any suggestion?

rsdunlapiv commented 4 years ago

@uturuncoglu - @arunchawla-NOAA was asking if we ran chgres outside of CIME on Cheyenne. Is this something you have tried?

uturuncoglu commented 4 years ago

@rsdunlapiv no I did not try it but I could check it. BTW, I could process GRIB without any problem with 6x6 combination under CIME.

uturuncoglu commented 4 years ago

@rsdunlapiv I run chgres outside of the CIME with 6x6 combination that works on Jet but still fails on Cheyenne.

GeorgeGayno-NOAA commented 4 years ago

@ligiabernardet @GeorgeGayno-NOAA I am still getting error with those configuration. The CHGRES fails with rc=<error reading variable: Cannot access memory at address 0x0>, error.

Do you know the exact line where it is failing?

uturuncoglu commented 4 years ago

Here is the full trace that I have

MPT: 0x00002afaf83066da in waitpid () from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-49.16.x86_64
MPT: (gdb) #0  0x00002afaf83066da in waitpid ()
MPT:    from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: #1  0x00002afaf8a45db6 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7ffe589d9180 "MPT ERROR: Rank 3(g:3) received signal SIGSEGV(11).\n\tProcess ID: 6043, Host: r9i2n30, Program: /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/bin/chgres_cube.exe\n\tMPT Version: "...) at sig.c:340
MPT: #3  0x00002afaf8a45fb2 in first_arriver_handler (signo=signo@entry=11,
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2afafe620080) at sig.c:489
MPT: #4  0x00002afaf8a4634b in slave_sig_handler (signo=11,
MPT:     siginfo=<optimized out>, extra=<optimized out>) at sig.c:564
MPT: #5  <signal handler called>
MPT: #6  0x00002afaf898e9bb in pmpi_abort__ ()
MPT:    from /glade/u/apps/ch/opt/mpt/2.19/lib/libmpi.so
MPT: #7  0x00000000005337cd in error_handler (string=...,
MPT:     rc=<error reading variable: Cannot access memory at address 0x0>,
MPT:     .tmp.STRING.len_V$7=55327936)
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/utils.f90:11
MPT: #8  0x00000000004fa084 in program_setup::read_setup_namelist ()
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/program_setup.f90:287
MPT: #9  0x0000000000469b49 in chgres ()
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/chgres.F90:63
MPT: #10 0x0000000000456aa2 in main ()
MPT: #11 0x00002afaf933c6e5 in __libc_start_main ()
MPT:    from /glade/u/apps/ch/os/lib64/libc.so.6
MPT: #12 0x00000000004569a9 in _start () at ../sysdeps/x86_64/start.S:118
MPT: (gdb) A debugging session is active.
MPT:
MPT:    Inferior 1 [process 6043] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/6043/exe, process 6043

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA i also submit job to big memory nodes which has 109 GB memory on each node. I'll run with 6x6 combination and let you know.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA it is still failing with same way. it seems that it is not due to memory limitation.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA @ligiabernardet is there any change the input_data type in chgres side for nemsio?

GeorgeGayno-NOAA commented 4 years ago

Here is the full trace that I have

MPT: 0x00002afaf83066da in waitpid () from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-49.16.x86_64
MPT: (gdb) #0  0x00002afaf83066da in waitpid ()
MPT:    from /glade/u/apps/ch/os/lib64/libpthread.so.0
MPT: #1  0x00002afaf8a45db6 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7ffe589d9180 "MPT ERROR: Rank 3(g:3) received signal SIGSEGV(11).\n\tProcess ID: 6043, Host: r9i2n30, Program: /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/bin/chgres_cube.exe\n\tMPT Version: "...) at sig.c:340
MPT: #3  0x00002afaf8a45fb2 in first_arriver_handler (signo=signo@entry=11,
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2afafe620080) at sig.c:489
MPT: #4  0x00002afaf8a4634b in slave_sig_handler (signo=11,
MPT:     siginfo=<optimized out>, extra=<optimized out>) at sig.c:564
MPT: #5  <signal handler called>
MPT: #6  0x00002afaf898e9bb in pmpi_abort__ ()
MPT:    from /glade/u/apps/ch/opt/mpt/2.19/lib/libmpi.so
MPT: #7  0x00000000005337cd in error_handler (string=...,
MPT:     rc=<error reading variable: Cannot access memory at address 0x0>,
MPT:     .tmp.STRING.len_V$7=55327936)
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/utils.f90:11
MPT: #8  0x00000000004fa084 in program_setup::read_setup_namelist ()
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/program_setup.f90:287
MPT: #9  0x0000000000469b49 in chgres ()
MPT:     at /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v1.1.0/intel-19.0.5/mpt-2.19/src/NCEPLIBS/UFS_UTILS/sorc/chgres_cube.fd/chgres.F90:63
MPT: #10 0x0000000000456aa2 in main ()
MPT: #11 0x00002afaf933c6e5 in __libc_start_main ()
MPT:    from /glade/u/apps/ch/os/lib64/libc.so.6
MPT: #12 0x00000000004569a9 in _start () at ../sysdeps/x86_64/start.S:118
MPT: (gdb) A debugging session is active.
MPT:
MPT:    Inferior 1 [process 6043] will be detached.
MPT:
MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
MPT: Detaching from program: /proc/6043/exe, process 6043

ESMF outputs "PET" files. What do they say? Your trace says the failure is in program_setup. That is very early in the processing.

uturuncoglu commented 4 years ago

As I know we where using gaussian before for nemsio but if I look at the chgres there are two options, gaussian_nemsio and gfs_gaussian_nemsio

uturuncoglu commented 4 years ago

It says - FATAL ERROR: UNRECOGNIZED INPUT DATA TYPE. so I am setting input data type as gaussian. I think it is changed in the CHGRES when netcdf is introduced. Am I right?

ligiabernardet commented 4 years ago

NEMSIO -> gaussian netCDF -> gaussian_netcdf

On Thu, Sep 3, 2020 at 3:21 PM Ufuk Turunçoğlu notifications@github.com wrote:

It says - FATAL ERROR: UNRECOGNIZED INPUT DATA TYPE. so I am setting input data type as gaussian. I think it is changed in the CHGRES when netcdf is introduced. Am I right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/179#issuecomment-686770844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAR3KXUENSG3PB4MADLSEACD7ANCNFSM4QQ7HXOQ .

uturuncoglu commented 4 years ago

@ligiabernardet I think that it is changed and now it is gaussian_nemsio. This is very critical information that I don't know. I have just submit the job by setting it to gaussian_nemsio and it is working.

ligiabernardet commented 4 years ago

OK, I did not realize that either.

On Thu, Sep 3, 2020 at 3:26 PM Ufuk Turunçoğlu notifications@github.com wrote:

@ligiabernardet https://github.com/ligiabernardet I think that it is changed and now it is gaussian_nemsio. This is very critical information that I don't know. I have just submit the job by setting it to gaussian_nemsio and it is working.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/179#issuecomment-686772797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAUOFWUOSRLGVIIXSKTSEACXFANCNFSM4QQ7HXOQ .

uturuncoglu commented 4 years ago

@ligiabernardet @arunchawla-NOAA @GeorgeGayno-NOAA @panll I think that if we implement those kind of changes that could have critical importance for the CIME interface, we need to share that information or at least discuss about it. It is just a lack of information exchange that causes lots of extra effort for everyone.

uturuncoglu commented 4 years ago

@ligiabernardet we also need to update the documentation about it. I'll make necessary changes in the CIME side and try to run again.

panll commented 4 years ago

I have a successful run for C768 on Cheyenne with 12 node and 3 cpus each. Here is the directory: /glade/scratch/lpan/09012020/ufs-mrweather-app-workflow.c768/run @uturuncoglu @ligiabernardet

ligiabernardet commented 4 years ago

Hallelujah! OK, we will update the documentation wrt input_type gaussian_nemsio.

On Thu, Sep 3, 2020 at 3:45 PM panll notifications@github.com wrote:

I have a successful run for C768 on Cheyenne with 12 node and 3 cpus each. Here is the directory: /glade/scratch/lpan/09012020/ufs-mrweather-app-workflow.c768/run @uturuncoglu https://github.com/uturuncoglu @ligiabernardet https://github.com/ligiabernardet

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/179#issuecomment-686780309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAXUR4DLXKR5V22ITJTSEAFANANCNFSM4QQ7HXOQ .

uturuncoglu commented 4 years ago

@panll Thanks. That is great. I could also run with 6 node and 6 cpus each.

panll commented 4 years ago

Great! @uturuncoglu

ligiabernardet commented 4 years ago

@uturuncoglu What are we still missing? Do you have a processor/node number that works for C384 on Cheyenne?

uturuncoglu commented 4 years ago

@ligiabernardet I run full test suite and one of the highest resolution case with debug failed after producing output of couple of hours. It seems it is not related with the CHGRES and I am looking into logs. C384 is working without any problem at least for grib2. I'll make more test with other data types for both C384 and C768 and then update the app for you to test. In the current configuration CHGRES is configured as,

C96 - 2 nodes with 6 core per node C192 - 2 nodes with 6 core per node C384 - 4 nodes with 6 core per node C768 - 6 nodes with 6 core per node

but to be in the safe side, we could increase the number of nodes for C384 and C768. I am also plaining to test it on Stampede but NCEP libs need to be installed there first. @climbfuji did you install NCEPLIBS there? do you have access to Stampede?

GeorgeGayno-NOAA commented 4 years ago

It says - FATAL ERROR: UNRECOGNIZED INPUT DATA TYPE. so I am setting input data type as gaussian. I think it is changed in the CHGRES when netcdf is introduced. Am I right?

Yes, when chgres was updated for GFS v16 netCDF files, the input_type names were changed. You should use "gaussian_nemsio" for GFS v15 nemsio files and "gaussian_netcdf" for GFS v16 netcdf files.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA i am testing GNU on Cheyenne and for C768 the CHGRES is failing with following error

 - CALL FieldCreate FOR INPUT GRID LONGITUDE.
 - CALL FieldScatter FOR INPUT GRID LONGITUDE.
 - CALL FieldScatter FOR INPUT GRID LONGITUDE.
 - CALL FieldScatter FOR INPUT GRID LATITUDE.
 - CALL FieldScatter FOR INPUT GRID LATITUDE.
 - CALL FieldScatter FOR INPUT GRID LATITUDE.
 - CALL FieldScatter FOR INPUT GRID LATITUDE.
#0  0x2adebec4faff in ???
#1  0x2adebf2d79bb in ???
#0  0x2adebec4faff in ???
#1  0x2adebf2d79bb in ???
#0  0x2b2c37867aff in ???
#1  0x2b2c37eef9bb in ???
#0  0x2b2c37867aff in ???
#1  0x2b2c37eef9bb in ???
#0  0x2b2c37867aff in ???
#1  0x2b2c37eef9bb in ???
#0  0x2b2c37867aff in ???
#1  0x2b2c37eef9bb in ???
#0  0x2b2c37867aff in ???
#1  0x2b2c37eef9bb in ???
MPT ERROR: MPI_COMM_WORLD rank 19 has terminated without calling MPI_Finalize()
    aborting job
MPT: Received signal 11

I have no trace at this point because I am using @climbfuji installation. This resolution is running with Intel compiler. If you test it with GNU compiler could you let me know the correct combination.

uturuncoglu commented 4 years ago

@GeorgeGayno-NOAA additional log from C384. BTW, not all C384 test are failed.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

arunchawla-NOAA commented 4 years ago

solved

ufs-community / ufs-mrweather-app

CHGRES error when processing GRIB2 data #179