Closed xiaojingchen closed 5 years ago
pump start args:
/pump -L=info -advertise-addr=stability-cluster1-pump-0.stability-cluster1-pump:8250 -config=/etc/pump/pump.toml -log-file=
the pump is normal the log of pump:
2019/04/30 11:24:42 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059505503895553
2019/04/30 11:24:52 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:897117 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:24:52 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059508649623553
2019/04/30 11:25:02 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:897694 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:25:02 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059511008919553
2019/04/30 11:25:12 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:898186 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:25:12 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059513368215553
2019/04/30 11:25:22 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:898757 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:25:22 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059516513943553
2019/04/30 11:25:32 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:899334 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:25:32 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059518886346753
2019/04/30 11:25:42 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:899826 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:25:42 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059521245642753
2019/04/30 11:25:52 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:900397 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:25:52 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059524391370753
2019/04/30 11:26:02 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:900974 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:26:02 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059526750666753
2019/04/30 11:26:12 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:901466 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:26:12 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059529109962753
2019/04/30 11:26:22 storage.go:355: [info] DBStats: {WriteDelayCount:0 WriteDelayDuration:0s WritePaused:false AliveSnapshots:0 AliveIterators:0 IOWrite:902037 IORead:1358741 BlockCacheSize:21751 OpenedTablesCount:1 LevelSizes:[467062] LevelTablesCounts:[1] LevelRead:[0] LevelWrite:[0] LevelDurations:[0s]}
2019/04/30 11:26:22 server.go:506: [info] writeBinlogCount: 0, alivePullerCount: 0, maxCommitTS: 408059532255690753
@july2993 ptal
why the tidb use the incomplete domain stability-cluster1-pump-0
?
the correct domain is stability-cluster1-pump-0.stability-cluster1-pump
what's the tidb and pump version, i test on the latest version and it can work.
@xiaojingchen please supply the tidb and pump verison.
/ # /pump -V
2019/05/05 08:05:25 version.go:18: [info] Git Commit Hash: 50f5b19d55371b28840f56b778db48420cff761a
2019/05/05 08:05:25 version.go:19: [info] Build TS: 2019-01-26 11:47:07
2019/05/05 08:05:25 version.go:20: [info] Go Version: go1.11.2
2019/05/05 08:05:25 version.go:21: [info] Go OS/Arch: linuxamd64
/ # /tidb-server -V
Release Version: v2.1.4
Git Commit Hash: a1d10c17039ec289ef490481b25c7091dba7ac31
Git Branch: release-2.1
UTC Build Time: 2019-02-15 06:24:33
GoVersion: go version go1.11.2 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
Check Table Before Drop: false
@july2993 Is this fixed in TiDB v2.1.8?
2019/04/30 09:22:10.119 client.go:288: [warning] [pumps client] write binlog to pump stability-cluster1-pump-0:8250 (type: Commit, start ts: 408057515149361211, commit ts: 408057515214897172, length: 45) error rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup stability-cluster1-pump-0.stability-cluster1-pump on 10.233.0.10:53: no such host"
i check the code, in the log write binlog to pump stability-cluster1-pump-0:8250
,stability-cluster1-pump-0:8250
is the NodeID of pump, and is get from the host name default.
but it do connect to stability-cluster1-pump-0.stability-cluster1-pump
, and failed because error: desc = "transport: Error while dialing dial tcp: lookup stability-cluster1-pump-0.stability-cluster1-pump on 10.233.0.10:53: no such host"
i check the log of tidb, the log is happen before 2019/04/30 09:28:08.891, but the logs of pump by kubectl logs stability-cluster1-pump-0 -n stability-cluster1 > out
looks the last start time is
stability-cluster1-pump-2019/04/30 09:34:51 version.go:18: ^[[0;37m[info] Git Commit Hash: 50f5b19d55371b28840f56b778db48420cff761a^[[0m
2019/04/30 09:34:51 version.go:19: ^[[0;37m[info] Build TS: 2019-01-26 11:47:07^[[0m
2019/04/30 09:34:51 version.go:20: ^[[0;37m[info] Go Version: go1.11.2^[[0m
2019/04/30 09:34:51 version.go:21: ^[[0;37m[info] Go OS/Arch: linuxamd64^[[0m
maybe at that time pump is not started, so it failed to write.
@july2993 why the tidb is still failed when the pump had started? I think the tidb should connect to pump when the pump started success.
Pump is mandatory to start before tidb, and for data security, the tidb must be unavailable when the pump is down, at which point it must be restarted manually. At present, there is no good solution to this problem. It is suggested to put it on hold for the time being.
Bug Report
What version of Kubernetes are you using?
What version of TiDB Operator are you using? least version
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
What's the status of the TiDB cluster pods?
What did you do? do failover test and pumb pod be restarted What did you see instead? i found the tidb server became unavailable the tidb's log :