ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

Port to CU summit issues #93

Closed pjpegion closed 4 years ago

pjpegion commented 4 years ago

I am trying to follow the CIME porting directions to build and run an University of Colorado's Summit platform. Following the directions, I have build NCEPLIBS-external and NCEPLIBS.
I ran into a issues configuring CIME, and also the model crashes during initialization.

1- I created a config_machines.xml, but it is not clear what to put in NODENAME_REGEX, I has to hard code this to the node I am logged into. CIME doesn't seem to handle regular expressions here (I also had this issue on the linux cluster) and another users also has similar problem dealing with CEMS (this is how I learned that I cannot use regular expressions).

2-Model builds and jobs get submitted via CIME. chgres_cube runs and generates the initial conditions, but the model fails during initialization. I'm using gnu 8.2 compiler, and here are my linked libraries. And I have attached the log file which has the runtime error. [pegion@shas0136 run]$ ldd ../bld/ufs.exe linux-vdso.so.1 => (0x00007ffdf2bef000) libesmf.so => /projects/pegion/NCEPLIBS-external/build-all/install/lib64/libesmf.so (0x00002b7b6c594000) librt.so.1 => /lib64/librt.so.1 (0x00002b7b6dfe2000) libstdc++.so.6 => /curc/sw/gcc/8.2.0/lib64/libstdc++.so.6 (0x00002b7b6e1ea000) libdl.so.2 => /lib64/libdl.so.2 (0x00002b7b6e56e000) libnetcdff.so.7 => /projects/pegion/NCEPLIBS-external/build-all/install/lib64/libnetcdff.so.7 (0x00002b7b6e772000) libnetcdf.so.15 => /projects/pegion/NCEPLIBS-external/build-all/install/lib64/libnetcdf.so.15 (0x00002b7b6e9f9000) libmpifort.so.12 => /projects/pegion/NCEPLIBS-external/build-all/install/lib/libmpifort.so.12 (0x00002b7b6ecee000) libmpi.so.12 => /projects/pegion/NCEPLIBS-external/build-all/install/lib/libmpi.so.12 (0x00002b7b6ef26000) libgfortran.so.5 => /curc/sw/gcc/8.2.0/lib64/libgfortran.so.5 (0x00002b7b6f474000) libm.so.6 => /lib64/libm.so.6 (0x00002b7b6f8e1000) libgcc_s.so.1 => /curc/sw/gcc/8.2.0/lib64/libgcc_s.so.1 (0x00002b7b6fbe3000) libquadmath.so.0 => /curc/sw/gcc/8.2.0/lib64/libquadmath.so.0 (0x00002b7b6fdfb000) libc.so.6 => /lib64/libc.so.6 (0x00002b7b7003b000) libmpicxx.so.12 => /projects/pegion/NCEPLIBS-external/build-all/install/lib/libmpicxx.so.12 (0x00002b7b70408000) libgomp.so.1 => /curc/sw/gcc/8.2.0/lib64/libgomp.so.1 (0x00002b7b70629000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b7b70857000) /lib64/ld-linux-x86-64.so.2 (0x00002b7b6c370000) libhdf5_hl.so.100 => /projects/pegion/NCEPLIBS-external/build-all/install/lib/libhdf5_hl.so.100 (0x00002b7b70a73000) libhdf5.so.103 => /projects/pegion/NCEPLIBS-external/build-all/install/lib/libhdf5.so.103 (0x00002b7b70c96000) libxml2.so.2 => /lib64/libxml2.so.2 (0x00002b7b7123c000) libz.so.1 => /lib64/libz.so.1 (0x00002b7b715a6000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00002b7b717bc000)

ufs.log.txt

arunchawla-NOAA commented 4 years ago

Phil what date are you trying to run? Is this out of the box Dorian case? Will you be able to stage the initial conditions created by CHGRES on Theia so @GeorgeGayno-NOAA can take a look?

@GeorgeGayno-NOAA according to the ufs.log.txt file that Phil posted ice_wat and graupel are getting initialized with really large values

GeorgeGayno-NOAA commented 4 years ago

Phil what date are you trying to run? Is this out of the box Dorian case? Will you be able to stage the initial conditions created by CHGRES on Theia so @GeorgeGayno-NOAA can take a look? @GeorgeGayno-NOAA according to the ufs.log.txt file that Phil posted ice_wat and graupel are getting initialized with really large values

Are these large values in the coldstart files from chgres? Looking at ufs.log.txt, I see large values at the start of processing. But farther down the log file, I see these fields appear to be reset to zero.

jedwards4b commented 4 years ago

Regular expression do work in the regex field. You can find a number of examples in config/cesm/machines.

pjpegion commented 4 years ago

@arunchawla-NOAA and @GeorgeGayno-NOAA I believe I'm running to Dorian case. I put the initial conditions on hera in /scratch2/BMC/gsienkf/Philip.Pegion/ufs-data/summit I also put the ics I generated a coupled weeks ago from cheyenne in /scratch2/BMC/gsienkf/Philip.Pegion/ufs-data/cheyenne

I look a look at the gfsdata files, and they look fine to me. I haven't looked at the sfc* files yet.

pjpegion commented 4 years ago

@jedwards4b what is the trick? I am saying that when I put in regular expressions, I get an error. I also cannot follow the logic in figuring out the NODENAME_REGX, for example cheyenne is listed as ".*.?cheyenne\d?.ucar.edu", but when I do a hostname, I only get cheyenne, and hostname -f gives me cheyenne4.ib0.cheyenne.ucar.edu. Conversely, on my linux cluster, hostname gives apollo1, but I could only get cime to work if I used apollo in NODENAME_REGEX.

jedwards4b commented 4 years ago

Hostname -f is close to what is used. The exact command is socket.getfqdn()

On cheyenne the login nodes are named login\d.cheyenne\d.ucar.edu but the hostname command gives the ib network name which is different. Name matching is imperfect thus we also have a --machine option.

On Sun, Feb 23, 2020, 13:10 Phil Pegion notifications@github.com wrote:

@jedwards4b https://github.com/jedwards4b what is the trick? I am saying that when I put in regular expressions, I get an error. I also cannot follow the logic in figuring out the NODENAME_REGX, for example cheyenne is listed as ".*.?cheyenne\d?.ucar.edu", but when I do a hostname, I only get cheyenne, and hostname -f gives me cheyenne4.ib0.cheyenne.ucar.edu. Conversely, on my linux cluster, hostname gives apollo1, but I could only get cime to work if I used apollo in NODENAME_REGEX.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/93?email_source=notifications&email_token=ABOXUGEC5CCKNUGW4RXMTKTRELJ27A5CNFSM4KZJO5WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWF6KA#issuecomment-590110504, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGA242FZLU66G6NT5PTRELJ27ANCNFSM4KZJO5WA .

pjpegion commented 4 years ago

@jedwards4b thanks for the clarification. @arunchawla-NOAA and @GeorgeGayno-NOAA I have the model running now. It was because I was still trying to run nsst with the grib2 input. This should be fixed when I get the latest version of CIME.