wrf-model / WPS

The official repository for the WRF Preprocessing System (WPS)
195 stars 155 forks source link

Geogrid and metgrid fail with 0 exit status #252

Open Peter9192 opened 1 month ago

Peter9192 commented 1 month ago

We're trying to run WPS and WRF in an automated setting. We want the workflow to fail as soon as one of the steps fail. For this, we rely on the exit status of each of the programs.

We've noticed that under certain conditions, geogrid and metgrid fail with a 0 exit status. This happened when we ran geogrid with the wrong vtable, and again when metgrid didn't find the ./geo_em.d01.nc files.

For example:

+ /home/pkalverla1/wrf-model/WPS/geogrid.exe
Parsed 50 entries in GEOGRID.TBL
Processing domain 1 of 1
ERROR: Could not open /projects/0/prjs0914/wrf-data/default/static/WPS_GEOG/landfire_data/index

Despite this error message, the exit status was 0. Similarly for metgrid later on:

+ /home/pkalverla1/wrf-model/WPS/metgrid.exe
Processing domain 1 of 1
ERROR: Couldn't open file ./geo_em.d01.nc for input.

It seems the error originates here https://github.com/wrf-model/WPS/blob/5a2ae63988e632405a4504cfb143ce7f0230a7a0/geogrid/src/source_data_module.F#L866-L878

The ERROR level is handled in mprintf: https://github.com/wrf-model/WPS/blob/5a2ae63988e632405a4504cfb143ce7f0230a7a0/geogrid/src/module_debug.F#L316-L324

I'm not very experienced in writing Fortran code, but I wonder if this could be solved by adding an integer status code to the stop command, or by using error stop instead, as suggested here. If so, I'm happy to open a PR.

weiwangncar commented 1 month ago

@Peter9192 First of all, you're not using standard dataset we have in the release. In the standard release, we do not have landfire_data. Second, if this dataset is optional to your run, you should set 'optional = yes' in the geogrid/GEOGRID.TBL under landfire section (see other sections as examples), or remove it from the table. For future report like this, please post it in the support Forum at https://forum.mmm.ucar.edu/.

Peter9192 commented 1 month ago

Dear @weiwangncar. Thanks for the reply.

I know this is not the standard dataset, and I know I can fix the issue by using the standard. This is not a troubleshooting request. The example is simply to illustrate a failing use case.

My point is about the behaviour of geogrid and metgrid. They print an error, but return a zero exit status. I think they should return a non-zero exit status, since they did not complete succesfully. Do you agree?

weiwangncar commented 1 month ago

@Peter9192 I'll let others to comment.

Peter9192 commented 1 month ago

Thanks; could you re-open the issue and/or notify others in that case?