Open AugustoPeres opened 4 months ago
yeah, we've also had problems with this. As a workaround, we are parsing the stdout and stderr output and have some heuristics that determine if there was an error after all.
For the record, handling this can be even more complicated because the error codes depend on the MPI implementation, too. For instance, on some tests we did with an older schism version (5.9):
openmpi + mpirun -n 8 schism -> error code 0
mpich + mpirun -n 8 schism -> error code 0 or 9 - about 50-50 between them
Now this might be an issue with openmpi/mpich but it could also be an issue of the way schism's MPI code has been implemented. Haven't really looked deeper into it.
@pmav99, thank you very much for your reply.
We will take a look at how to parse the stdout
and stderr
to detect failed simulations. Could you share a little bit more on the heuristics that you are using to catch failed simulations?
However, it you be great if this was working out-of-the-box :)
The error says you need to specify # of scribe processes; see online manual.
Hi there,
I have recently started using the schism simulator and noticed that the exit code is zero even when the simulator fails:
Is there anyway we can have the exit code reflect the fact that the simulation failed?