Open pchila opened 2 years ago
Nice catch. We will address this as soon as possible. Related to https://github.com/keptn-contrib/job-executor-service/issues/234
Example output of describe job
:
Name: job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1
Namespace: keptn-jes
Selector: controller-uid=1da7d309-aab9-4a91-9487-bc710dfea8a9
Labels: controller-uid=1da7d309-aab9-4a91-9487-bc710dfea8a9
job-name=job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1
Annotations: <none>
Parallelism: 1
Completions: 1
Pods Statuses: 0 Active / 0 Succeeded / 0 Failed
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 56s (x4 over 2m6s) job-controller Error creating: pods "job-executor-service-job-1067030a-bc7f-4e65-98b1-669e-1-" is forbidden: error looking up service account keptn-jes/inexistentServiceAccount: serviceaccount "inexistentServiceAccount" not found
re-opening this, as the issue still exists - we can probaly solve this via refactoring (https://github.com/keptn-contrib/job-executor-service/issues/244)
When Job Executor Service creates a k8s Job that cannot spawn pod (if the wrong serviceAccount for the task is specified for example) the sequence fails because of timeout but no logs can be retrieved as there are no pods to fetch them from. Furthermore the created job is not collected by the k8s TTL controller as it never finishes, so it will keep trying to spawn pods long after Job Executor Service gave up on it (possibly indefinitely if there's a configuration error) so it has to be manually removed by the user.
The Job Executor Service should detect that the job failed to start and add relevant information for the user extracted from the job status/events and explicitly delete the job if it didn't spawn any pods.
How to reproduce:
Use a job config with a wrong service account: