kube-HPC / hkube

🐟 High Performance Computing over Kubernetes - Core Repo 🎣
http://hkube.io
MIT License
305 stars 20 forks source link

Retry algorithm execution, if system resource error (e.g. VM/ Docker failure) occurred during the algorithm execution #65

Closed Ronenk5678 closed 6 years ago

Ronenk5678 commented 6 years ago

As a HKube Manager I want that HKube will retry algorithm execution, if system resource error (e.g. VM/ Docker failure) occurred during the algorithm execution. The motivation: System resource error handling.

User Story implementation details: The information written to the sytem log will include the following:

  1. General pipeline details, see REQ_73
  2. Event description: Algorithm retry due to resource error, including the algorithm ID.
eytangro commented 6 years ago

TID-111