It's easy to underestimate walltime requirements, especially with heterogeneous hardware used for deployment in multiple clouds. Then it is frustrating that the job output is not retrievable (often it is usable, restartable, or salvageable)...
Apps can internally manage their own walltime and exit cleanly before it, but prominence could also help out those that don't: Prominence would need to internally kill the execution and upload the tarball just before the walltime limit, or take a snapshot just before the walltime limit.
It's easy to underestimate walltime requirements, especially with heterogeneous hardware used for deployment in multiple clouds. Then it is frustrating that the job output is not retrievable (often it is usable, restartable, or salvageable)...
Apps can internally manage their own walltime and exit cleanly before it, but prominence could also help out those that don't: Prominence would need to internally kill the execution and upload the tarball just before the walltime limit, or take a snapshot just before the walltime limit.