Closed alahiff closed 4 years ago
In https://github.com/prominence-eosc/imc/commit/e784a265cd17d87b5b534a69666ae8d7ff415815 we now check for any infrastructures which have been in the waiting
state for more than 15 mins.
Currently nothing will ever enter the waiting
state - need to modify probably multicloud_deploy.py
and deploy.py
so that approprate failures result in the waiting
state rather than failed
.
Now infrastructures are set to the waiting
state when necessary https://github.com/prominence-eosc/imc/commit/82066757c3057bce83003795003a42b92c18c529
Currently infrastructures are released from the waiting state after a fixed 15 mins. Could we improve this and make it more dynamic?
Initially this will do and it works, closing.
If deployment fails, infrastructure could go into a
waiting
state, and then be retried later. This would remove (or at least reduce) the need for the job router to retry so many times.