Closed vivek-bala closed 8 years ago
In the first CU, the particular module is not found and hence the error is written to STDERR. But execution goes forward with the default python loaded on login (which seems to be sufficient).
In the gromacs CUs, whether the execution is successful or not, the output from gromacs is written to STDERR. So without analysis the STDERR from the client side (some text wrangling required possible to pickup any error code generated or search for "error"), I am not sure if its possible to distinguish between the two. Any ideas ?
Previously, I tried to stop execution when there was some content in STDERR but since gromacs (and possibly other kernels) write the output to STDERR, it wouldn't be correct to stop execution just because of "some" content in STDERR as well.
OK, I think the most robust way to handle error cases is rather than checking STDERR (either that it is non-empty, or doing application-specific grepping), is to test the return code from the application. If it is non-zero then the CU should fail.
In the cases above, in the preloop CU we are doing the right thing - checking the exit code from spliter.py and exiting with that code.
In the gromacs CUs the radical_pilot_cu_launch_script.sh does the right thing but the run.py wrapper launches grompp, mdrun etc. without checking the error codes, so we should fix that.
I'll be moving away from this wrapper method, so this shouldn't come up once that is done.
Keep the wrapper method as an example. Add the 1sim/1CU method as an example as well. Depending on number of simulations and simulation length, user can choose either method.
From issue #223 :