pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 494 forks source link

LoadBalancerClass error #5543

Open aki263 opened 7 months ago

aki263 commented 7 months ago

Bug Report

What version of Kubernetes are you using?

1.27 What version of TiDB Operator are you using?

1.5.2 What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

Not relevant

What's the status of the TiDB cluster pods?

All running

What did you do?

I installed TIDB operator 1.5.2 via helm and my tidb cluster config looks something like this


apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: tidb-cluster
  namespace: tidb-cluster
spec:
  version: v7.4.0
  timezone: EST
  configUpdateStrategy: RollingUpdate
  pvReclaimPolicy: Delete
  schedulerName: default-scheduler
  topologySpreadConstraints:
  - topologyKey: topology.kubernetes.io/zone
  enableDynamicConfiguration: true
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
    maxFailoverCount: 0
    replicas: 1
    storageClassName: gp3
    requests:
      storage: "10Gi"
    config: |
      [dashboard]
        internal-proxy = true
      [replication]
        max-replicas = 3
    nodeSelector:
      dedicated: pd
  tikv:
    baseImage: pingcap/tikv
    maxFailoverCount: 0
    replicas: 1
    storageClassName: gp3
    requests:
      storage: "2048Gi"
    config: {}
    nodeSelector:
      dedicated: tikv

  tidb:
    baseImage: pingcap/tidb
    maxFailoverCount: 0
    replicas: 1
    hostNetwork: true
    config: |
      [log]
        level = "debug"
        format = "json"
        enable-timestamp = true
        [log.file]
            filename = "/var/log/tidb/tidb-general.log"
    storageClassName: gp3
    service:
      # loadBalancerClass: "service.k8s.aws/nlb"
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        service.beta.kubernetes.io/aws-load-balancer-target-node-labels: dedicated=tidb
        external-dns.alpha.kubernetes.io/hostname: xyz.com
      exposeStatus: false
      externalTrafficPolicy: Local
      type: NodePort
    config: |
      [performance]
        tcp-keep-alive = true
    annotations:
      tidb.pingcap.com/sysctl-init: "true"
    podSecurityContext:
      sysctls:
      - name: net.ipv4.tcp_keepalive_time
        value: "300"
      - name: net.ipv4.tcp_keepalive_intvl
        value: "75"
      - name: net.core.somaxconn
        value: "32768"
    separateSlowLog: true
    nodeSelector:
      dedicated: tidb

Everything works as expected but I can not make any changes to tidb component like upgrade version or update the replicas because of following error in tidb-operator

I0205 16:26:13.315960       1 service_control.go:91] update Service: [tidb-cluster/tidb-cluster-pd] successfully, kind: , name: tidb-cluster
I0205 16:26:13.379126       1 equality.go:87] Service spec diff for tidb-cluster/tidb-cluster-tidb:   v1.ServiceSpec{
    Ports: []v1.ServicePort{
        {
            ... // 3 identical fields
            Port:       4000,
            TargetPort: {IntVal: 4000},
-           NodePort:   0,
+           NodePort:   31593,
        },

    },
    Selector:  {"app.kubernetes.io/component": "tidb", "app.kubernetes.io/instance": "tidb-cluster", "app.kubernetes.io/managed-by": "tidb-operator", "app.kubernetes.io/name": "tidb-cluster"},
    ClusterIP: "",
    ... // 6 identical fields
    ExternalName:             "",
    ExternalTrafficPolicy:    "Local",
-   HealthCheckNodePort:      0,
+   HealthCheckNodePort:      31590,
    PublishNotReadyAddresses: false,
    SessionAffinityConfig:    nil,
    ... // 4 identical fields
  }
I0205 16:26:13.379153       1 tidb_member_manager.go:518] Sync TiDB service tidb-cluster/tidb-cluster-tidb, spec equal: false, annotations equal: true, label equal: true
E0205 16:26:13.385425       1 tidb_cluster_controller.go:143] TidbCluster: tidb-cluster/tidb-cluster, sync failed Service "tidb-cluster-tidb" is invalid: spec.loadBalancerClass: Invalid value: "null": may not change once set, requeuing

I am using AWS loadbalancer controller 2.7.0 and external DNS.

What did you expect to see?

What did you see instead?

csuzhangxc commented 7 months ago

Is the existing Service with loadBalancerClass set?

spec.loadBalancerClass: Invalid value: "null": may not change once set, requeuing

aki263 commented 7 months ago

@csuzhangxc I think alb controller is setting this automatically. Its set to loadBalancerClass: service.k8s.aws/nlb in my service automatically.

csuzhangxc commented 7 months ago

currently, tc.spec.tidb.service doesn't support all fields of the K8s service, the workaround for this may create an extra service out of the TidbCluster CR.

if we need to fix it in TiDB Operator, we may need to add all fields support in TidbCluster or update the check&update for a Service.