hysds / hysds-framework

HySDS framework releases
Apache License 2.0
4 stars 7 forks source link

retain logs and data of failed jobs in a location accessible to the operator #3

Open pymonger opened 6 years ago

pymonger commented 6 years ago

All job work dirs are exposed via WebDAV on each worker instance.

Work directory is left unscathed in case of failed jobs but longevity is not guaranteed if subsequent jobs need disk space to run. Need development if we want to ship failed work directories to a less volatile work space. Or, as we did in OCO-2, we catch exceptions in our PGE so verdi never catches a failure and ship those work dirs out to external storage for review.

pymonger commented 6 years ago

Implemented: https://github.com/hysds/hysds/commit/39541fcaced94c06ba090ce16854229f17199c02 Tested on GRFN b-cluster.