Closed skanthed closed 2 years ago
It's likely this job failed because the pod was scheduled on a node other than the one on which the volume was available. Despite this error it looks like your backups are otherwise completing regularly; right now I see:
NAME READY STATUS RESTARTS AGE
backup-to-bucket-27306391--1-zvkcw 0/2 Completed 0 56d
backup-to-bucket-27387360--1-fd68p 0/2 Completed 0 15h
backup-to-bucket-27387720--1-n9c4m 0/2 Completed 0 9h
backup-to-bucket-27388080--1-98rqk 0/2 Completed 0 3h30m
We may be able to avoid this problem by providing some sort of scheduling hint that ensures the backup pod runs in the right place. I'm going to investigate what our options are for that solution.
Just to confirm @larsks that was indeed the issue. We changed the node for the cronjob to point to the same node as the controller pod, and the job ran successfully.
But the pod that was created has nothing in logs, It pushed nothing to the database and the status was completed. Waiting for another few pods to be created and will update it here.
@skanthed apologies by successfully I meant the pod was scheduled and ran to completion. Not that the container process within it performed as it was supposed to.
That I have no idea about.
@HumairAK I understand. No issues, I was just clarifying the details.
We should be able to solve this by configuring pod affinity constraints. That allows us to schedule one pod (e.g., the backup pod) by requiring that it runs on a node that is already running other pods with specific labels.
For this to work, we need to ensure the koku metrics pod has appropriate labels, which means modifying the relevant Deployment
resource. How is koku metrics being deployed; is this via an operator or some other mechanism?
Deployed via an operator, no other mechanism.
@skanthed where does the operator come from? It doesn't provide a very rich set of labels on the controller pod, but there is one label we can use:
$ oc -n koku-metrics-operator get pod koku-metrics-controller-manager-784bf87577-k4dhx -o jsonpath='{.metadata.labels}{"\n"}'
{"control-plane":"controller-manager","pod-template-hash":"784bf87577"}
So if we match on the control-plane
label, we would add an affinity
section something like this to the backup pod:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: control-plane
operator: In
values:
- controller-manager
topologyKey: kubernetes.io/hostname
This should arrange for the backup-to-bucket pod to run on the same node as the koku-metrics-controller-manager pod.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
/close
@sesheta: Closing this issue.
Describe the bug
Cron jobs not starting for curator project -
Pod backup-to-bucket says - "Unable to attach or mount volumes: unmounted volumes=[koku-metrics-operator-data]"
Link to the issue - https://console-openshift-console.apps.smaug.na.operate-first.cloud/k8s/ns/koku-metrics-operator/pods/backup-to-bucket-27385920-crmjz/events
Screenshots