openkruise / kruise

Automated management of large-scale applications on Kubernetes (incubating project under CNCF)
https://openkruise.io
Other
4.64k stars 763 forks source link

[BUG] BroadcastJob activeDeadlineSeconds did not take effect #1409

Closed ls-2018 closed 7 months ago

ls-2018 commented 1 year ago

What happened:

pod is over 300 seconds, but still there

What you expected to happen:

pod cleared

How to reproduce it (as minimally and precisely as possible):

echo 'kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  "EphemeralContainers": true
nodes:
- role: control-plane
  image: kindest/node:v1.26.0
  extraPortMappings:
  - containerPort: 6443
    hostPort: 6443
    protocol: TCP
- role: worker
  image: kindest/node:v1.26.0
  labels:
    zone: c
' >/tmp/kind.yaml

kind create cluster --config /tmp/kind.yaml
kubectl cluster-info --context kind-kind

helm install kruise openkruise/kruise --version 1.4.0

echo 'apiVersion: apps.kruise.io/v1alpha1
kind: BroadcastJob
metadata:
  name: broadcastjob-ttl
spec:
  template:
    spec:
      containers:
        - name: pi
          image: registry.cn-hangzhou.aliyuncs.com/acejilam/tensorflow:latest-gpu     # 5.7G
      restartPolicy: OnFailure
  completionPolicy:
    type: Always
    activeDeadlineSeconds: 300
    ttlSecondsAfterFinished: 300
  paused: false
  parallelism: 3
' >/tmp/bcj.yaml
kubectl apply -f /tmp/bcj.yaml

Anything else we need to know?:

But it doesn't happen every time, and I've found that when the watch pod, node, or BroadcastJob have no events, the tuning logic doesn't fire

Environment:

image image image

I changed the activeDeadlineSeconds to 30 seconds, and I did it again.

image
stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.