pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.23k stars 499 forks source link

TiDB-operator is unable to delete TiProxy servers #5835

Open kos-team opened 1 week ago

kos-team commented 1 week ago

Bug Report

What version of Kubernetes are you using? Client Version: v1.31.1 Kustomize Version: v5.4.2

What version of TiDB Operator are you using? v1.6.0

What did you do? We deployed a cluster with TiProxy and then try to remove the TiProxy from the cluster

How to reproduce

  1. Deploy a TiDB cluster with TiProxy enabled, for example:

    apiVersion: pingcap.com/v1alpha1
    kind: TidbCluster
    metadata:
    name: test-cluster
    spec:
    configUpdateStrategy: RollingUpdate
    enableDynamicConfiguration: true
    helper:
    image: alpine:3.16.0
    pd:
    baseImage: pingcap/pd
    config: "[dashboard]\n  internal-proxy = true\n"
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 10Gi
    pvReclaimPolicy: Retain
    ticdc:
    baseImage: pingcap/ticdc
    replicas: 3
    tidb:
    baseImage: pingcap/tidb
    config: "[performance]\n  tcp-keep-alive = true\ngraceful-wait-before-shutdown\
      \ = 30\n"
    maxFailoverCount: 0
    replicas: 3
    service:
      externalTrafficPolicy: Local
      type: NodePort
    tiflash:
    baseImage: pingcap/tiflash
    replicas: 3
    storageClaims:
    - resources:
        requests:
          storage: 10Gi
    tikv:
    baseImage: pingcap/tikv
    config: 'log-level = "info"
    
      '
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 100Gi
    scalePolicy:
      scaleOutParallelism: 5
    timezone: UTC
    tiproxy:
    version: main
    replicas: 5
    sslEnableTiDB: true
    version: v8.1.0
  2. Remove spec.tiproxy:

    apiVersion: pingcap.com/v1alpha1
    kind: TidbCluster
    metadata:
    name: test-cluster
    spec:
    configUpdateStrategy: RollingUpdate
    enableDynamicConfiguration: true
    helper:
    image: alpine:3.16.0
    pd:
    baseImage: pingcap/pd
    config: "[dashboard]\n  internal-proxy = true\n"
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 10Gi
    pvReclaimPolicy: Retain
    ticdc:
    baseImage: pingcap/ticdc
    replicas: 3
    tidb:
    baseImage: pingcap/tidb
    config: "[performance]\n  tcp-keep-alive = true\ngraceful-wait-before-shutdown\
      \ = 30\n"
    maxFailoverCount: 0
    replicas: 3
    service:
      externalTrafficPolicy: Local
      type: NodePort
    tiflash:
    baseImage: pingcap/tiflash
    replicas: 3
    storageClaims:
    - resources:
        requests:
          storage: 10Gi
    tikv:
    baseImage: pingcap/tikv
    config: 'log-level = "info"
    
      '
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 100Gi
    scalePolicy:
      scaleOutParallelism: 5
    timezone: UTC
    version: v8.1.0

What did you expect to see? TiProxy should be removed from cluster and all TiProxy Pods should be deleted.

What did you see instead? The TiProxy Pods are undeleted and still in Running state.

Root Cause In the tiproxyMemberManager's Sync() function, it directly returns if the spec.tiproxy is set to nil. The deletion logic is not implemented.

csuzhangxc commented 1 week ago

Have you tried to set the replicas to 0 instead of to remove the whole section?

kos-team commented 1 week ago

@csuzhangxc Thanks for the workaround, we tried to set the replicas to 0 and it works.

It would be nice to have a more declarative interface though.