Open rpieczon opened 11 months ago
@rpieczon just to clarify, are you saying if any pod (even if unassociated with Akri) is unready, it causes this slot reconciliation error? From what i remember slot reconciliation should only check pods with an expected annotation.
i lose track a little, but the annotations are on the container, not the pod i think ... and it might be that an unready pod is considered a potential place where an annotated container could eventually exist. might be worth looking at the resource requests to limit where this early exit happens.
might be hard to check for the resource though. if the pod isn't ready and the container doesn't exist, there isn't much context to check the instances against.
@rpieczon just to clarify, are you saying if any pod (even if unassociated with Akri) is unready, it causes this slot reconciliation error? From what i remember slot reconciliation should only check pods with an expected annotation.
Exactly in my case I have failing Prometheus POD which has zero requirements related with USB allocation.
Any update on it?
Describe the bug
Akri agent daemonset keeps reporting following error whenever any of pod running on a cluster is not ready.
2023-11-16T13:44:46Z TRACE agent::util::slot_reconciliation] reconcile - Pods with unready Containers exist on this node, we can't clean the slots yet
In my case failing POD doesn't use USB resources.
Output of
kubectl get pods,akrii,akric -o wide
Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s]
Expected behavior
I would expect reconciliation process can be continue if failing pod is out of usb usage.management context.