tektoncd / operator

Kubernetes operator to manage installation, updation and uninstallation of tektoncd projects (pipeline, …)
Apache License 2.0
445 stars 197 forks source link

coschedule isolate-pipelinerun doesn't work #2318

Open ruialves7 opened 2 months ago

ruialves7 commented 2 months ago

Expected Behavior

I have configured my tekton operator with:

  pipeline:
    disable-affinity-assistant: true
    coschedule: isolate-pipelinerun
    enable-api-fields: "alpha"

I'm using the autoscaler on my node group on aws eks is using a ASG multi AZ. My trigger template has this configuration:

(...)
      podTemplate:
        securityContext:
          fsGroup: 65532
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                  - key: pipelines
                    operator: In
                    values:
                      - "pipelines"
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: app.kubernetes.io/component
                      operator: In
                      values:
                         - "affinity-assistant"
                topologyKey: "kubernetes.io/hostname"  # Add the topologyKey here

        nodeSelector:
          pipelines: tom-pipelines
        tolerations:
          - key: dedicated
            operator: Equal
            value: pipelines
            effect: NoSchedule
(...)

Actual Behavior

My pipelinerun has PVC and if I understand correctly the documentation, when we use coschedule isolate-pipelinerun, each pipelinerun should run in a different physical node, but it doesn't happen and retun this error:

pod status "PodScheduled":"False"; message: "0/7 nodes are available: 1 node(s) didn''t match pod anti-affinity rules, 1 node(s) had untolerated taint {app: permanentpod}, 2 node(s) had untolerated taint {app: 24h}, 3 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/7 nodes are available: 1 No preemption victims found for incoming pod, 6 Preemption is not helpful for scheduling."

Steps to Reproduce the Problem

1. 2. 3.

Additional Info

Client Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.4-eks-8ccc7ba", GitCommit:"892db4a4e439987d7addade5f9595cadfa06db2e", GitTreeState:"clean", BuildDate:"2023-08-15T16:06:56Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"30+", GitVersion:"v1.30.3-eks-2f46c53", GitCommit:"69ba22bf73c1112e7933fc61b220c00b554a7f66", GitTreeState:"clean", BuildDate:"2024-07-25T04:23:44Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.30) exceeds the supported minor version skew of +/-1
khrm commented 1 month ago

Can you share what's the PipelineRun that is being generated by Trigger? Is feature flag configmap generated correctly?

ruialves7 commented 1 month ago

Hi, On my K8s AutoScaler this is the message: I0927 09:31:51.685584 1 orchestrator.go:565] Pod pipelines/dev-pull-request-runfdpxn-curl-in-progress-pod can't be scheduled on eks-pipelines-node-group-b0c8f03f-1e3a-7266-dd3c-d6b07096b6c3, predicate checking error: node(s) didn't match pod affinity rules; predicateName=InterPodAffinity; reasons: node(s) didn't match pod affinity rules; debugInfo= The error message on pipeline run pod: 'pod status "PodScheduled":"False"; message: "0/8 nodes are available: 1 node(s) didn''t match pod anti-affinity rules, 2 node(s) had untolerated taint {app: 24h}, 2 node(s) had untolerated taint {app: permanentpod}, 3 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/8 nodes are available: 1 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling."'

My tektonConfig: apiVersion: operator.tekton.dev/v1alpha1 kind: TektonConfig metadata: name: config spec: profile: basic config: nodeSelector: app: 24h tolerations:

This works if i used AWS EFS on my PVC and disable affinity roles.