pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.24k stars 499 forks source link

TiDB cluster doesnt start-up correctly #5678

Closed DelaunayAntoine closed 4 months ago

DelaunayAntoine commented 4 months ago

Bug Report

What version of Kubernetes are you using? Client 1.22.6 Server 1.26.1

What version of TiDB Operator are you using? TiDB Operator 1.6.0

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods? Nfs-client (custom storage class)

What's the status of the TiDB cluster pods? kubectl-get-po-o-wide

What did you do? Deploy the tidb-operator with advanced statefulset enabled :

Deploy a basic tidb cluster with pd as ms :

What did you expect to see? I expect my TiDBs to build straight away, or to understand why it takes so long to get started. Understanding why my pods can't communicate

What did you see instead? I've seen my TiDB pods launch two days later or never launch at all. I've seen my pods never communicate with each other.

Hello, I may have done a few things wrong with my deployment, please don't hesitate to correct me. I'm currently trying to deploy TiDB on kubernetes with the tidb-operator but I'm encountering several problems and I can't solve them. So I'm turning to you for help. I'm encountering two problems:

In the following example, I'm also trying to deploy PD as a microservice (perhaps my deployment isn't right), but I've also tried without setting PD as a microservice and that didn't work either.

For operator deployment, I use the chart you provided. I also work offline, so I have to download my images and charts. sc-with-o-tidb

In the next section you will find the logs of my various components :

Advanced-StateFulSet logs : advanced-statefulset-controller.txt

Controller-Manager logs : tidb-operator.txt

Discovery logs : discovery.txt

PD logs (for the 3 pods) : pd(1).txt pd(2).txt pd.txt

Tso logs : tso(1).txt tso.txt

Scheduling logs : scheduling.txt

TiKV logs (for the 3 pods): tikv(1).txt tikv(2).txt tikv.txt

TiFlash logs: tiflash(1).txt tiflash(2).txt tiflash.txt

TiProxy logs : tiproxy(1).txt tiproxy(2).txt tiproxy.txt

Here is my manifest for my deployment :

For tidb-operator : values-tidb-operator.txt

For my cluster : basic-deploy-tidb-cluster.txt

csuzhangxc commented 4 months ago

what's the status of your cluster now? I think you have many issues here, can we try to resolve them one by one?

image

For this InvalidImageName, have you resolved it now?

DelaunayAntoine commented 4 months ago

@csuzhangxc Thank you very much for your respond and i'm sorry for the delay.

The issue on this screenshot was resolved it was just a problem with the name of my image and the repo for the image. This was not an up to date screenshot

Here is what it look like now : Capture d’écran 2024-07-11 083659

As you can see now it is in CrashLoopBackOff So the first i did was to describe the pods to look at the events. Here is what it look like : `Events: Type Reason Age From Message


Warning Unhealthy 40m (x1210 over 23h) kubelet Readiness probe failed: dial tcp 192.168.1.74:4000: connect: connection refused Normal Started 10m (x241 over 23h) kubelet Started container tidb Warning BackOff 27s (x5581 over 23h) kubelet Back-off restarting failed container tidb in pod basic-tidb-0_tidb-operator(bdaef247-eb9d-4abc-8128-cc7d02e29a68)`

describe-tidb-crash.txt

Its coming back to the internal communication between the different component inside TiDB. I dont know why there seems to be a connection issue. I use different application and they can communicate inside the kubernetes cluster without problem. I think it might be comming from my configuration wich can be bad.

csuzhangxc commented 4 months ago

is there any useful information in the TiDB Pods' log?

DelaunayAntoine commented 4 months ago

?

Sorry for the delays,

Here is the logs od the TIDB : start tidb-server ... /tidb-server --store=tikv --advertise-address=basic-tidb-0.basic-tidb-peer.tidb-operator.svc --host=0.0.0.0 --path=basic-pd:2379 --config=/etc/tidb/tidb.toml --log-slow-query=/var/log/tidb/slowlog [2024/07/15 06:00:12.426 +00:00] [INFO] [cgroup_cpu_linux.go:96] ["TiDB runs in a container, mount info: 3734 3606 0:416 / / rw,relatime master:976 - overlay overlay rw,lowerdir=/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/22170/fs:/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/22169/fs:/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/22168/fs:/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/22167/fs:/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/22166/fs,upperdir=/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29592/fs,workdir=/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/29592/work"] [2024/07/15 06:00:12.441 +00:00] [INFO] [printer.go:47] ["Welcome to TiDB."] ["Release Version"=v8.1.0] [Edition=Community] ["Git Commit Hash"=945d07c5d5c7a1ae212f6013adfb187f2de24b23] ["Git Branch"=HEAD] ["UTC Build Time"="2024-05-21 03:51:57"] [GoVersion=go1.21.10] ["Race Enabled"=false] ["Check Table Before Drop"=false] [2024/07/15 06:00:12.441 +00:00] [INFO] [cgmon.go:130] ["set the maxprocs"] [quota=8] [2024/07/15 06:00:12.442 +00:00] [INFO] [printer.go:52] ["loaded config"] [config="{\"host\":\"0.0.0.0\",\"advertise-address\":\"basic-tidb-0.basic-tidb-peer.tidb-operator.svc\",\"port\":4000,\"cors\":\"\",\"store\":\"tikv\",\"path\":\"basic-pd:2379\",\"socket\":\"/tmp/tidb-4000.sock\",\"lease\":\"45s\",\"split-table\":true,\"token-limit\":1000,\"temp-dir\":\"/tmp/tidb\",\"tmp-storage-path\":\"/tmp/0_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage\",\"tmp-storage-quota\":-1,\"server-version\":\"\",\"version-comment\":\"\",\"tidb-edition\":\"\",\"tidb-release-version\":\"\",\"keyspace-name\":\"\",\"log\":{\"level\":\"error\",\"format\":\"text\",\"disable-timestamp\":null,\"enable-timestamp\":null,\"disable-error-stack\":null,\"enable-error-stack\":null,\"file\":{\"filename\":\"\",\"max-size\":300,\"max-days\":0,\"max-backups\":3,\"compression\":\"\"},\"slow-query-file\":\"/var/log/tidb/slowlog\",\"expensive-threshold\":10000,\"general-log-file\":\"\",\"query-log-max-len\":4096,\"enable-slow-log\":true,\"slow-threshold\":300,\"record-plan-in-slow-log\":1,\"timeout\":0},\"instance\":{\"tidb_general_log\":false,\"tidb_pprof_sql_cpu\":false,\"ddl_slow_threshold\":300,\"tidb_expensive_query_time_threshold\":60,\"tidb_expensive_txn_time_threshold\":600,\"tidb_stmt_summary_enable_persistent\":false,\"tidb_stmt_summary_filename\":\"tidb-statements.log\",\"tidb_stmt_summary_file_max_days\":3,\"tidb_stmt_summary_file_max_size\":64,\"tidb_stmt_summary_file_max_backups\":0,\"tidb_enable_slow_log\":true,\"tidb_slow_log_threshold\":300,\"tidb_record_plan_in_slow_log\":1,\"tidb_check_mb4_value_in_utf8\":true,\"tidb_force_priority\":\"NO_PRIORITY\",\"tidb_memory_usage_alarm_ratio\":0.8,\"tidb_enable_collect_execution_info\":true,\"plugin_dir\":\"/data/deploy/plugin\",\"plugin_load\":\"\",\"max_connections\":0,\"tidb_enable_ddl\":true,\"tidb_rc_read_check_ts\":false,\"tidb_service_scope\":\"\"},\"security\":{\"skip-grant-table\":true,\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\",\"cluster-ssl-ca\":\"\",\"cluster-ssl-cert\":\"\",\"cluster-ssl-key\":\"\",\"cluster-verify-cn\":null,\"session-token-signing-cert\":\"\",\"session-token-signing-key\":\"\",\"spilled-file-encryption-method\":\"plaintext\",\"enable-sem\":false,\"auto-tls\":false,\"tls-version\":\"\",\"rsa-key-size\":4096,\"secure-bootstrap\":false,\"auth-token-jwks\":\"\",\"auth-token-refresh-interval\":\"1h0m0s\",\"disconnect-on-expired-password\":true},\"status\":{\"status-host\":\"0.0.0.0\",\"metrics-addr\":\"\",\"status-port\":10080,\"metrics-interval\":15,\"report-status\":true,\"record-db-qps\":false,\"record-db-label\":false,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-concurrent-streams\":1024,\"grpc-initial-window-size\":2097152,\"grpc-max-send-msg-size\":2147483647},\"performance\":{\"max-procs\":0,\"max-memory\":0,\"server-memory-quota\":0,\"stats-lease\":\"3s\",\"stmt-count-limit\":5000,\"pseudo-estimate-ratio\":0.8,\"bind-info-lease\":\"3s\",\"txn-entry-size-limit\":6291456,\"txn-total-size-limit\":104857600,\"tcp-keep-alive\":true,\"tcp-no-delay\":true,\"cross-join\":true,\"distinct-agg-push-down\":false,\"projection-push-down\":false,\"max-txn-ttl\":3600000,\"index-usage-sync-lease\":\"\",\"plan-replayer-gc-lease\":\"10m\",\"gogc\":100,\"enforce-mpp\":false,\"stats-load-concurrency\":5,\"stats-load-queue-size\":1000,\"analyze-partition-concurrency-quota\":16,\"plan-replayer-dump-worker-concurrency\":1,\"enable-stats-cache-mem-quota\":true,\"committer-concurrency\":128,\"run-auto-analyze\":true,\"force-priority\":\"NO_PRIORITY\",\"memory-usage-alarm-ratio\":0.8,\"enable-load-fmsketch\":false,\"lite-init-stats\":true,\"force-init-stats\":true,\"concurrently-init-stats\":false},\"prepared-plan-cache\":{\"enabled\":true,\"capacity\":100,\"memory-guard-ratio\":0.1},\"opentracing\":{\"enable\":false,\"rpc-metrics\":false,\"sampler\":{\"type\":\"const\",\"param\":1,\"sampling-server-url\":\"\",\"max-operations\":0,\"sampling-refresh-interval\":0},\"reporter\":{\"queue-size\":0,\"buffer-flush-interval\":0,\"log-spans\":false,\"local-agent-host-port\":\"\"}},\"proxy-protocol\":{\"networks\":\"\",\"header-timeout\":5,\"fallbackable\":false},\"pd-client\":{\"pd-server-timeout\":3},\"tikv-client\":{\"grpc-connection-count\":4,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-compression-type\":\"none\",\"grpc-shared-buffer-pool\":false,\"grpc-initial-window-size\":134217728,\"grpc-initial-conn-window-size\":134217728,\"commit-timeout\":\"41s\",\"async-commit\":{\"keys-limit\":256,\"total-key-size-limit\":4096,\"safe-window\":2000000000,\"allowed-clock-drift\":500000000},\"max-batch-size\":128,\"overload-threshold\":200,\"max-batch-wait-time\":0,\"batch-wait-size\":8,\"enable-chunk-rpc\":true,\"region-cache-ttl\":600,\"store-limit\":0,\"store-liveness-timeout\":\"1s\",\"copr-cache\":{\"capacity-mb\":1000},\"copr-req-timeout\":60000000000,\"ttl-refreshed-txn-size\":33554432,\"resolve-lock-lite-threshold\":16,\"max-concurrency-request-limit\":9223372036854775807,\"enable-replica-selector-v2\":true},\"binlog\":{\"enable\":false,\"ignore-error\":false,\"write-timeout\":\"15s\",\"binlog-socket\":\"\",\"strategy\":\"range\"},\"compatible-kill-query\":false,\"pessimistic-txn\":{\"max-retry-count\":256,\"deadlock-history-capacity\":10,\"deadlock-history-collect-retryable\":false,\"pessimistic-auto-commit\":false,\"constraint-check-in-place-pessimistic\":true},\"max-index-length\":3072,\"index-limit\":64,\"table-column-count-limit\":1017,\"graceful-wait-before-shutdown\":0,\"alter-primary-key\":false,\"treat-old-version-utf8-as-utf8mb4\":true,\"enable-table-lock\":false,\"delay-clean-table-lock\":0,\"split-region-max-num\":1000,\"top-sql\":{\"receiver-address\":\"\"},\"repair-mode\":false,\"repair-table-list\":[],\"isolation-read\":{\"engines\":[\"tikv\",\"tiflash\",\"tidb\"]},\"new_collations_enabled_on_first_bootstrap\":true,\"experimental\":{\"allow-expression-index\":false},\"skip-register-to-dashboard\":false,\"enable-telemetry\":false,\"labels\":{},\"enable-global-index\":false,\"deprecate-integer-display-length\":false,\"enable-enum-length-limit\":true,\"stores-refresh-interval\":60,\"enable-tcp4-only\":false,\"enable-forwarding\":false,\"max-ballast-object-size\":0,\"ballast-object-size\":0,\"transaction-summary\":{\"transaction-summary-capacity\":500,\"transaction-id-digest-min-duration\":2147483647},\"enable-global-kill\":true,\"enable-32bits-connection-id\":true,\"initialize-sql-file\":\"\",\"enable-batch-dml\":false,\"mem-quota-query\":1073741824,\"oom-action\":\"cancel\",\"oom-use-tmp-storage\":true,\"check-mb4-value-in-utf8\":true,\"enable-collect-execution-info\":true,\"plugin\":{\"dir\":\"/data/deploy/plugin\",\"load\":\"\"},\"max-server-connections\":0,\"run-ddl\":true,\"disaggregated-tiflash\":false,\"autoscaler-type\":\"aws\",\"autoscaler-addr\":\"tiflash-autoscale-lb.tiflash-autoscale.svc.cluster.local:8081\",\"is-tiflashcompute-fixed-pool\":false,\"autoscaler-cluster-id\":\"\",\"use-autoscaler\":false,\"tidb-max-reuse-chunk\":64,\"tidb-max-reuse-column\":256,\"tidb-enable-exit-check\":false,\"in-mem-slow-query-topn-num\":30,\"in-mem-slow-query-recent-num\":500}"] [2024/07/15 06:01:02.829 +00:00] [FATAL] [terror.go:309] ["unexpected error"] [error="[tikv:9005]Region is unavailable"] [stack="github.com/pingcap/tidb/pkg/parser/terror.MustNil\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:309\nmain.createStoreAndDomain\n\t/workspace/source/tidb/cmd/tidb-server/main.go:421\nmain.main\n\t/workspace/source/tidb/cmd/tidb-server/main.go:326\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"] [stack="github.com/pingcap/tidb/pkg/parser/terror.MustNil\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:309\nmain.createStoreAndDomain\n\t/workspace/source/tidb/cmd/tidb-server/main.go:421\nmain.main\n\t/workspace/source/tidb/cmd/tidb-server/main.go:326\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]

As far as I can make out, it's because he can't find a TIKV region. This seems normal to me, as tikv can't start up either because it can't connect to the PD endpoint.

If we resolve the internal addressing between PD and TIKV so that it can recognize each other. We resolve the endpoint between PD and tikv.

But I can't find a solution with PD for TIKV to connect to the PD endpoint.

csuzhangxc commented 4 months ago

Can you show the log of TiKV again?

DelaunayAntoine commented 4 months ago

Of course :

tikv(3).txt

csuzhangxc commented 4 months ago

try the method in https://github.com/pingcap/tidb-operator/issues/5372#issuecomment-1794020036

DelaunayAntoine commented 4 months ago

try the method in #5372 (comment)

Thanks, im trying this but now the pd pods show fatal error :

domain resolve basic-pd-0.basic-pd-peer.tidb-operator.svc success 192.168.1.71 hostIps: 192.168.1.71 resolvedIps: 192.168.1.71 Success: Resolved IP matches one of podIPs starting pd-server ... /pd-server services api --data-dir=/var/lib/pd --name=basic-pd-0 --peer-urls=http://0.0.0.0:2380 --advertise-peer-urls=http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380 --client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2379 --config=/etc/pd/pd.toml --join=http://basic-pd-1.basic-pd-peer.tidb-operator.svc:2380,http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380 [2024/07/15 09:05:38.592 +00:00] [INFO] [meminfo.go:213] ["use physical memory hook"] [cgroupMemorySize=9223372036854775807] [physicalMemorySize=16765227008] [2024/07/15 09:05:38.592 +00:00] [INFO] [versioninfo.go:98] ["Welcome to Placement Driver (API SERVICE)"] [2024/07/15 09:05:38.592 +00:00] [INFO] [versioninfo.go:99] ["API SERVICE"] [release-version=v8.1.0] [2024/07/15 09:05:38.592 +00:00] [INFO] [versioninfo.go:100] ["API SERVICE"] [edition=Community] [2024/07/15 09:05:38.592 +00:00] [INFO] [versioninfo.go:101] ["API SERVICE"] [git-hash=fca469ca33eb5d8b5e0891b507c87709a00b0e81] [2024/07/15 09:05:38.592 +00:00] [INFO] [versioninfo.go:102] ["API SERVICE"] [git-branch=HEAD] [2024/07/15 09:05:38.592 +00:00] [INFO] [versioninfo.go:103] ["API SERVICE"] [utc-build-time="2024-05-09 02:15:45"] [2024/07/15 09:05:38.592 +00:00] [INFO] [metricutil.go:86] ["disable Prometheus push client"] [2024/07/15 09:05:38.595 +00:00] [INFO] [server.go:255] ["API Service config"] [config="{\"client-urls\":\"http://0.0.0.0:2379\",\"peer-urls\":\"http://0.0.0.0:2380\",\"advertise-client-urls\":\"http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2379\",\"advertise-peer-urls\":\"http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380\",\"name\":\"basic-pd-0\",\"data-dir\":\"/var/lib/pd\",\"force-new-cluster\":false,\"enable-grpc-gateway\":true,\"initial-cluster\":\"basic-pd-1=http://basic-pd-1.basic-pd-peer.tidb-operator.svc:2380,basic-pd-0=http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380\",\"initial-cluster-state\":\"existing\",\"initial-cluster-token\":\"pd-cluster\",\"join\":\"http://basic-pd-1.basic-pd-peer.tidb-operator.svc:2380,http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380\",\"lease\":3,\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":false,\"file\":{\"filename\":\"\",\"max-size\":0,\"max-days\":0,\"max-backups\":0},\"development\":false,\"disable-caller\":false,\"disable-stacktrace\":false,\"disable-error-verbose\":true,\"sampling\":null,\"error-output-path\":\"\"},\"max-concurrent-tso-proxy-streamings\":5000,\"tso-proxy-recv-from-client-timeout\":\"1h0m0s\",\"tso-save-interval\":\"3s\",\"tso-update-physical-interval\":\"50ms\",\"enable-local-tso\":false,\"metric\":{\"job\":\"basic-pd-0\",\"address\":\"\",\"interval\":\"15s\"},\"schedule\":{\"max-snapshot-count\":64,\"max-pending-peer-count\":64,\"max-merge-region-size\":20,\"max-merge-region-keys\":0,\"split-merge-interval\":\"1h0m0s\",\"switch-witness-interval\":\"1h0m0s\",\"enable-one-way-merge\":\"false\",\"enable-cross-table-merge\":\"true\",\"patrol-region-interval\":\"10ms\",\"max-store-down-time\":\"30m0s\",\"max-store-preparing-time\":\"48h0m0s\",\"leader-schedule-limit\":4,\"leader-schedule-policy\":\"count\",\"region-schedule-limit\":2048,\"witness-schedule-limit\":4,\"replica-schedule-limit\":64,\"merge-schedule-limit\":8,\"hot-region-schedule-limit\":4,\"hot-region-cache-hits-threshold\":3,\"store-limit\":{},\"tolerant-size-ratio\":0,\"low-space-ratio\":0.8,\"high-space-ratio\":0.7,\"region-score-formula-version\":\"v2\",\"scheduler-max-waiting-operator\":5,\"enable-remove-down-replica\":\"true\",\"enable-replace-offline-replica\":\"true\",\"enable-make-up-replica\":\"true\",\"enable-remove-extra-replica\":\"true\",\"enable-location-replacement\":\"true\",\"enable-debug-metrics\":\"false\",\"enable-joint-consensus\":\"true\",\"enable-tikv-split-region\":\"true\",\"enable-heartbeat-breakdown-metrics\":\"true\",\"schedulers-v2\":[{\"type\":\"balance-region\",\"args\":null,\"disable\":false,\"args-payload\":\"\"},{\"type\":\"balance-leader\",\"args\":null,\"disable\":false,\"args-payload\":\"\"},{\"type\":\"hot-region\",\"args\":null,\"disable\":false,\"args-payload\":\"\"},{\"type\":\"evict-slow-store\",\"args\":null,\"disable\":false,\"args-payload\":\"\"}],\"schedulers-payload\":null,\"hot-regions-write-interval\":\"10m0s\",\"hot-regions-reserved-days\":7,\"max-movable-hot-peer-size\":512,\"enable-diagnostic\":\"true\",\"enable-witness\":\"false\",\"slow-store-evicting-affected-store-ratio-threshold\":0.3,\"store-limit-version\":\"v1\"},\"replication\":{\"max-replicas\":3,\"location-labels\":\"\",\"strictly-match-label\":\"false\",\"enable-placement-rules\":\"true\",\"enable-placement-rules-cache\":\"false\",\"isolation-level\":\"\"},\"pd-server\":{\"use-region-storage\":\"true\",\"max-gap-reset-ts\":\"24h0m0s\",\"key-type\":\"table\",\"runtime-services\":\"\",\"metric-storage\":\"\",\"dashboard-address\":\"auto\",\"flow-round-by-digit\":3,\"min-resolved-ts-persistence-interval\":\"1s\",\"server-memory-limit\":0,\"server-memory-limit-gc-trigger\":0.7,\"enable-gogc-tuner\":\"false\",\"gc-tuner-threshold\":0.6,\"block-safe-point-v1\":\"false\"},\"cluster-version\":\"0.0.0\",\"labels\":{},\"quota-backend-bytes\":\"8GiB\",\"auto-compaction-mode\":\"periodic\",\"auto-compaction-retention-v2\":\"1h\",\"TickInterval\":\"500ms\",\"ElectionInterval\":\"3s\",\"PreVote\":true,\"max-request-bytes\":157286400,\"security\":{\"cacert-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"cert-allowed-cn\":null,\"SSLCABytes\":null,\"SSLCertBytes\":null,\"SSLKEYBytes\":null,\"redact-info-log\":false,\"encryption\":{\"data-encryption-method\":\"plaintext\",\"data-key-rotation-period\":\"168h0m0s\",\"master-key\":{\"type\":\"plaintext\",\"key-id\":\"\",\"region\":\"\",\"endpoint\":\"\",\"path\":\"\"}}},\"label-property\":null,\"WarningMsgs\":null,\"DisableStrictReconfigCheck\":false,\"HeartbeatStreamBindInterval\":\"1m0s\",\"LeaderPriorityCheckInterval\":\"1m0s\",\"dashboard\":{\"tidb-cacert-path\":\"\",\"tidb-cert-path\":\"\",\"tidb-key-path\":\"\",\"public-path-prefix\":\"\",\"internal-proxy\":false,\"enable-telemetry\":false,\"enable-experimental\":false},\"replication-mode\":{\"replication-mode\":\"majority\",\"dr-auto-sync\":{\"label-key\":\"\",\"primary\":\"\",\"dr\":\"\",\"primary-replicas\":0,\"dr-replicas\":0,\"wait-store-timeout\":\"1m0s\",\"wait-recover-timeout\":\"0s\",\"pause-region-split\":\"false\"}},\"keyspace\":{\"pre-alloc\":null,\"wait-region-split\":true,\"wait-region-split-timeout\":\"30s\",\"check-region-split-interval\":\"50ms\"},\"micro-service\":{\"enable-scheduling-fallback\":\"true\"},\"controller\":{\"degraded-mode-wait-duration\":\"0s\",\"ltb-max-wait-duration\":\"30s\",\"request-unit\":{\"read-base-cost\":0.125,\"read-per-batch-base-cost\":0.5,\"read-cost-per-byte\":0.0000152587890625,\"write-base-cost\":1,\"write-per-batch-base-cost\":1,\"write-cost-per-byte\":0.0009765625,\"read-cpu-ms-cost\":0.3333333333333333},\"enable-controller-trace-log\":\"false\"}}"] [2024/07/15 09:05:38.604 +00:00] [INFO] [apiutil.go:413] ["register REST path"] [path=/pd/api/v1] [2024/07/15 09:05:38.604 +00:00] [INFO] [apiutil.go:413] ["register REST path"] [path=/pd/api/v2/] [2024/07/15 09:05:38.604 +00:00] [INFO] [apiutil.go:413] ["register REST path"] [path=/autoscaling] [2024/07/15 09:05:38.605 +00:00] [INFO] [distro.go:51] ["using distribution strings"] [strings={}] [2024/07/15 09:05:38.608 +00:00] [INFO] [apiutil.go:413] ["register REST path"] [path=/dashboard/api/] [2024/07/15 09:05:38.608 +00:00] [INFO] [apiutil.go:413] ["register REST path"] [path=/dashboard/] [2024/07/15 09:05:38.608 +00:00] [INFO] [registry.go:92] ["restful API service registered successfully"] [prefix=basic-pd-0] [service-name=MetaStorage] [2024/07/15 09:05:38.609 +00:00] [INFO] [apiutil.go:413] ["register REST path"] [path=/resource-manager/api/v1/] [2024/07/15 09:05:38.609 +00:00] [INFO] [registry.go:92] ["restful API service registered successfully"] [prefix=basic-pd-0] [service-name=ResourceManager] [2024/07/15 09:05:38.610 +00:00] [WARN] [config.go:622] ["Running http and grpc server on single port. This is not recommended for production."] [2024/07/15 09:05:38.610 +00:00] [INFO] [etcd.go:120] ["configuring peer listeners"] [listen-peer-urls="[http://0.0.0.0:2380]"] [2024/07/15 09:05:38.610 +00:00] [INFO] [systimemon.go:30] ["start system time monitor"] [2024/07/15 09:05:38.611 +00:00] [ERROR] [etcd.go:543] ["creating peer listener failed"] [error="listen tcp 0.0.0.0:2380: bind: address already in use"] [2024/07/15 09:05:38.611 +00:00] [INFO] [etcd.go:375] ["closing etcd server"] [name=basic-pd-0] [data-dir=/var/lib/pd] [advertise-peer-urls="[http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380]"] [advertise-client-urls="[http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2379]"] [2024/07/15 09:05:38.611 +00:00] [INFO] [etcd.go:379] ["closed etcd server"] [name=basic-pd-0] [data-dir=/var/lib/pd] [advertise-peer-urls="[http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2380]"] [advertise-client-urls="[http://basic-pd-0.basic-pd-peer.tidb-operator.svc:2379]"] [2024/07/15 09:05:38.611 +00:00] [FATAL] [main.go:282] ["run server failed"] [error="[PD:etcd:ErrStartEtcd]listen tcp 0.0.0.0:2380: bind: address already in use: listen tcp 0.0.0.0:2380: bind: address already in use"] [stack="main.start\n\t/workspace/source/pd/cmd/pd-server/main.go:282\nmain.createAPIServerWrapper\n\t/workspace/source/pd/cmd/pd-server/main.go:183\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:987\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/pd/cmd/pd-server/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]

csuzhangxc commented 4 months ago

listen tcp 0.0.0.0:2380: bind: address already in use

Are you using hostNetwork and have more than one PD pods on a single node?

DelaunayAntoine commented 4 months ago

listen tcp 0.0.0.0:2380: bind: address already in use

Are you using hostNetwork and have more than one PD pods on a single node?

I use hostNetwork but use 3 Master and 6 worker. The deployment is automatised i dont particulary touch anything for the assignement of the pod.

I didnt get this error before wich is why its suprising to get one now.

csuzhangxc commented 4 months ago

bind: address already in use

It means another process is using the 2380 port (may used by another PD process or anything else).

How many TidbCluster are there in your K8s cluster?

Cloud you delete this TidbCluster and re-deploy again?

DelaunayAntoine commented 4 months ago

Hi guys, sorry for the delays i was trying different things.

Cloud you delete this TidbCluster and re-deploy again?

I tried to do this but it didn't work because my pods always gave the error bind address already in use.

So I tried to modify my dploiement manifest with the comment GRPC_DNS_RESOLVER: native by deploying only 3 PD pods, 3 TIKV pods and 3 TIDB pods. As a result, the cluster launched without a hitch.

TiDB launched directly, PD found addresses to bind to and TIKV had no problem launching.

With this in mind, I decided to add TiFlash and TICDC (3 pods each).

TiCDC launched correctly and seemed to find PD quickly. Here are the logs : ticdc(1).txt ticdc.txt

But TiFLASH failed to find an endpoint for PD. Here are the logs: tiflash(3).txt

So I'm going to continue my research to get it working properly while waiting for your feedback, but thank you very much for your help, I'm making a lot of progress.

csuzhangxc commented 4 months ago

The error of TiFlash is similar with TiKV, so I think you can try to add GRPC_DNS_RESOLVER: native for TiFlash.

DelaunayAntoine commented 4 months ago

The error of TiFlash is similar with TiKV, so I think you can try to add GRPC_DNS_RESOLVER: native for TiFlash.

Yes they were the same actually, with adding GRPC_DNS_RESOLVER: native it resolved the problem.

I guessed all my problem are gone now with your help thank you very much. If i have any more question i will ask in this channel.

csuzhangxc commented 4 months ago

How about closing this issue and open a new one if you have another questions?

DelaunayAntoine commented 4 months ago

How about closing this issue and open a new one if you have another questions?

Yeah Sure no problem with that