radical-cybertools / ExTASY

MDEnsemble
Other
1 stars 1 forks source link

Improved error messages for common failure cases #162

Closed ibethune closed 9 years ago

ibethune commented 9 years ago

This ticket is for a future release, and to stimulate some discussion. May also need some help from the RP or SAGA layers to implement.

In common cases where e.g. the batch system fails to launch the job (for some reason), can we log some helpful error message to STDOUT, rather than just:

Pilot 551bd46af8cdba7d6aaccbb5 has FAILED. Can't recover.

In particular, we can't expect users to read through the log file to discover what went wrong. Also, we should review what output we give in the default usage mode - just running 'extasy' without EXTASY_DEBUG=True RADICAL_PILOT_VERBOSE='debug' SAGA_VERBOSE='debug' set.

andre-merzky commented 9 years ago

On RP level, the relevant information should be in pilot.stdout, pilot.stdout and/or pilot.log. Note that the last one is a list of entries, so you want to inspect them individually.

In general I agree, that those information should be available w/o debug output, at least in some level of detail useful to the end user...

ibethune commented 9 years ago

The two cases that came up yesterday were:

1) User tries to submit to a budget that they don't have access to 2) Budget is empty.

In both cases, PBS returns some useful text, and an error code:

" NoSuccess: Error running job via 'qsub': qsub: budget e280 does not have enough resource (0.000 kAU remaining, 1.000 kAU required) “

I guess other batch systems do the same. I think it would be good to expose this, in a similar way to how we dump the STDERR for a failing CU to the screen

vivek-bala commented 9 years ago

pilot stderr is also dumped to the screen now.