prominence-eosc / prominence

PROMINENCE server
Apache License 2.0
2 stars 0 forks source link

Retain output from jobs failing due to walltime limit #116

Closed fcasson closed 4 years ago

fcasson commented 4 years ago

It's easy to underestimate walltime requirements, especially with heterogeneous hardware used for deployment in multiple clouds. Then it is frustrating that the job output is not retrievable (often it is usable, restartable, or salvageable)...

Apps can internally manage their own walltime and exit cleanly before it, but prominence could also help out those that don't: Prominence would need to internally kill the execution and upload the tarball just before the walltime limit, or take a snapshot just before the walltime limit.

fcasson commented 4 years ago

Sorry, just realized this is a duplicate of #84. That solution doesn't seem to be active yet is working (after I fixed a bug in my script).