Open VoxT opened 2 years ago
@weekface could you take a look on this
Hi,do you still have questions? @VoxT
@VoxT Could you please share your TC yaml?
I have this same error in a kind cluster as well. I tried v5.3.0/6.0.0/6.1.0 using the advanced cluster config yaml as well as the basic config yaml from the examples folder. No luck same issue:
advanced-tidb-tikv-0 starting tikv-server ...
advanced-tidb-tikv-0 /tikv-server --pd=http://advanced-tidb-pd:2379 --advertise-addr=advanced-tidb-tikv-0.advanced-tidb-tikv-peer.db.svc:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --advertise-status-addr=advanced-tidb-tikv-0.advanced-tidb-tikv-peer.db.svc:20180 --data-dir=/var/lib/tikv --capacity=0 --config=/etc/tikv/tikv.toml
advanced-tidb-tikv-0
advanced-tidb-tikv-0 deprecated configuration, log-level has been moved to log.level
advanced-tidb-tikv-0 override log.level with log-level, Debug
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:79] ["Welcome to TiKV"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Release Version: 6.1.0"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Edition: Community"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Git Commit Hash: 080d086832ae5ce2495352dccaf8df5d40f30687"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Git Commit Branch: heads/refs/tags/v6.1.0"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["UTC Build Time: Unknown (env var does not exist when building)"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Rust Version: rustc 1.60.0-nightly (1e12aef3f 2022-02-13)"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Enable Features: jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [lib.rs:84] ["Profile: dist_release"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.992 +00:00] [INFO] [mod.rs:74] ["cgroup quota: memory=Some(9223372036854771712), cpu=None, cores={22, 5, 20, 17, 0, 16, 11, 18, 1, 7, 10, 2, 3, 8, 13, 12, 21, 23, 15, 4, 6, 14, 19, 9}"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.998 +00:00] [INFO] [mod.rs:81] ["memory limit in bytes: 69030949888, cpu cores quota: 24"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.998 +00:00] [WARN] [server.rs:1533] ["check: kernel"] [err="kernel parameters net.core.somaxconn got 4096, expect 32768"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.998 +00:00] [WARN] [server.rs:1533] ["check: kernel"] [err="kernel parameters net.ipv4.tcp_syncookies got 1, expect 0"]
advanced-tidb-tikv-0 [2022/11/13 02:32:59.998 +00:00] [WARN] [server.rs:1533] ["check: kernel"] [err="kernel parameters vm.swappiness got 60, expect 0"]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["Using polling engine: epollex"]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"grpclb\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"priority_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"weighted_target_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"pick_first\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"round_robin\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"ring_hash_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["Using ares dns resolver"]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering certificate provider factory for \"file_watcher\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"cds_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"xds_cluster_impl_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"xds_cluster_resolver_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.000 +00:00] [DEBUG] [<unknown>] ["registering LB policy factory for \"xds_cluster_manager_experimental\""]
advanced-tidb-tikv-0 [2022/11/13 02:33:00.007 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:02.009 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:02.009 +00:00] [WARN] [client.rs:163] ["validate PD endpoints failed"] [err="Other(\"[components/pd_client/src/util.rs:570]: PD cluster failed to respond\")"]
advanced-tidb-tikv-0 [2022/11/13 02:33:02.311 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:04.313 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:04.613 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:06.615 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:06.915 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:08.917 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://advanced-tidb-pd:2379]
advanced-tidb-tikv-0 [2022/11/13 02:33:09.217 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=http://advanced-tidb-pd:2379]
Not sure if this is the right place to comment. But I think this issue not only for tidb-operator
but also for others.
I also face this issue running v5.4.2
/v6.1.0
on my machine.
For deeper context:
Environment: CentOS 7
TiDB/TiKV/PD version: v5.4.2/v6.1.0
Architecture: PD pod (run by kind with image v5.4.2
) and TiKV pods (run by kind with image v5.4.2
)
[2022/11/14 16:03:06.108 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=pd0:2379]
[2022/11/14 16:03:08.109 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=pd0:2379]
[2022/11/14 16:03:08.410 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=pd0:2379]
[2022/11/14 16:03:10.411 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=pd0:2379]
[2022/11/14 16:03:10.712 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=pd0:2379]
[2022/11/14 16:03:12.713 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=pd0:2379]
[2022/11/14 16:03:13.014 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=pd0:2379]
[2022/11/14 16:03:15.015 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=pd0:2379]
[2022/11/14 16:03:15.316 +00:00] [INFO] [util.rs:575] ["connecting to PD endpoint"] [endpoints=pd0:2379]
[2022/11/14 16:03:17.317 +00:00] [INFO] [util.rs:537] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=pd0:2379]
PD config:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pd0
spec:
selector:
matchLabels:
name: pd0
template:
metadata:
labels:
name: pd0
spec:
containers:
- name: pd0
image: pingcap/pd:v5.4.2
args:
- --name=pd0
- --client-urls=http://0.0.0.0:2379
- --peer-urls=http://0.0.0.0:2380
- --advertise-client-urls=http://pd0:2379
- --advertise-peer-urls=http://pd0:2380
- --initial-cluster=pd0=http://pd0:2380
- --data-dir=/data/pd0
- --log-file=/logs/pd0.log
ports:
- containerPort: 2379
TiKV config:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tikv0
spec:
selector:
matchLabels:
name: tikv0
template:
metadata:
labels:
name: tikv0
spec:
containers:
- name: tikv0
image: pingcap/tikv:v5.4.2
args:
- --addr=0.0.0.0:20160
- --advertise-addr=tikv0:20160
- --data-dir=/data/tikv0
- --pd=pd0:2379
- --log-file=/logs/tikv0.log
ports:
- containerPort: 20160
The strangest thing here is, on the same machine, when I change from using v5.4.2
to v3.0.12
. The connection go through and my application worked well.
@mikechengwei @DanielZhangQD PTAL.
Bug Report
What version of Kubernetes are you using?
What version of TiDB Operator are you using?
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
What's the status of the TiDB cluster pods?
What did you do?
I try to deploy by this instruction for tidb v5.4.1: https://docs.pingcap.com/tidb-in-kubernetes/stable/deploy-tidb-operator
but TIKV can not started, the error on TiKV pod was:
I have precheck all the prerequisites, destroy and retry to deploy but still got the same error. Then I deploy tidb v3.0.12 and it started successfully, i try to upgrade it to v5.4.1, but guess what? The same error occurs.
What did you expect to see? The cluster should be started successfully when following the instruction.
What did you see instead? Error on TiKV when trying to connect to PD server.