Closed thuyenptr closed 4 years ago
Thanks for your reporting. To root cause this issue, please find the ddl owner
first and then check whether there is any error log.
To identify the ddl owner
, please use the admin show ddl;
statement to check the DDL info https://pingcap.com/docs/stable/faq/tidb/#how-to-view-the-progress-of-the-current-ddl-job
OWNER: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc
SELF_ID: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc
If the OWNER
is equal with SELF_ID
, then that instance is the owner.
If there is only one tidb-server instance, the first step is not necessary.
Please grep error
in the ddl owner
instance's log.
Thanks for reporting!
Please grep error in the ddl owner instance's log.
Can you provide the log of DDL owner instance, if it is not too large?
@billhcmus, Thanks for reporting.
If you don't mind, you can seed the log to my email: chenshuang@pingcap.com
Thank all for your fast reply, I will give you the error log as soon as possible.
Thanks for your reporting. To root cause this issue, please find the
ddl owner
first and then check whether there is any error log.To identify the
ddl owner
, please use theadmin show ddl;
statement to check the DDL info https://pingcap.com/docs/stable/faq/tidb/#how-to-view-the-progress-of-the-current-ddl-jobOWNER: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc SELF_ID: 1a1c4174-0fcd-4ba0-add9-12d08c4077dc
If the
OWNER
is equal withSELF_ID
, then that instance is the owner.If there is only one tidb-server instance, the first step is not necessary.
Please grep
error
in theddl owner
instance's log.
Hi, when I execute admin show ddl
in 3 instances of TiDB Server it shows:
TiDB 1:
> admin show ddl;
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| SCHEMA_VER | OWNER_ID | OWNER_ADDRESS | RUNNING_JOBS | SELF_ID | QUERY |
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| 2119 | b202210f-c26a-49f1-90a1-d99f488d62eb | :4000 | ID:3467, Type:drop table, State:running, SchemaState:delete only, SchemaID:3281, TableID:3459, RowCount:0, ArgLen:0, start time: 2020-02-03 16:50:24.51 +0700 +07, Err:<nil>, ErrCount:0, SnapshotVersion:0 | 77c6fbd5-060a-4f91-9782-7d68dd6be6cf | DROP TABLE IF EXISTS tidb_loader_voucher.checkpoint; |
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
TiDB 2:
> admin show ddl;
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| SCHEMA_VER | OWNER_ID | OWNER_ADDRESS | RUNNING_JOBS | SELF_ID | QUERY |
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| 2119 | b202210f-c26a-49f1-90a1-d99f488d62eb | :4000 | ID:3467, Type:drop table, State:running, SchemaState:delete only, SchemaID:3281, TableID:3459, RowCount:0, ArgLen:0, start time: 2020-02-03 16:50:24.51 +0700 +07, Err:<nil>, ErrCount:0, SnapshotVersion:0 | 9a2519ff-1a54-4012-b487-a0bc8d73ef79 | DROP TABLE IF EXISTS tidb_loader_voucher.checkpoint; |
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
> admin show ddl;
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| SCHEMA_VER | OWNER_ID | OWNER_ADDRESS | RUNNING_JOBS | SELF_ID | QUERY |
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| 2119 | b202210f-c26a-49f1-90a1-d99f488d62eb | :4000 | ID:3467, Type:drop table, State:running, SchemaState:delete only, SchemaID:3281, TableID:3459, RowCount:0, ArgLen:0, start time: 2020-02-03 16:50:24.51 +0700 +07, Err:<nil>, ErrCount:0, SnapshotVersion:0 | b9de0dc0-3797-4513-90e3-4e5274e3d4e3 | DROP TABLE IF EXISTS tidb_loader_voucher.checkpoint; |
+------------+--------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
The is no OWNER is equal with SELF_ID in the above result. I think this is the cause of the problem, the worker in each instance has no role to execute pending DDL. One more thing, before I execute the DDL, a few days ago 3 instances of TiDB crash because of OOM, then I restart them. Every DMLs working normally but DDLs stuck.
@billhcmus Could you show me the result of "curl http://{TiDBIP}:{TiDBPORT}/info/all"?
@billhcmus Could you show me the result of "curl http://{TiDBIP}:{TiDBPORT}/info/all"?
Here Something wrong with: b202210f-c26a-49f1-90a1-d99f488d62eb
=================================================================================================
"status_port": 10080,
"lease": "45s"
},
"a38e4abd-173f-4109-87a1-1c0f4ccb61f6": {
"version": "5.7.25-TiDB-v3.0.1",
"git_hash": "9e4e8da3c58c65123db5f26409759fe1847529f8",
"ddl_id": "a38e4abd-173f-4109-87a1-1c0f4ccb61f6",
"ip": "",
"listening_port": 4000,
"status_port": 10080,
"lease": "45s"
},
"b202210f-c26a-49f1-90a1-d99f488d62eb": {
"version": "5.7.25-TiDB-v3.0.1",
"git_hash": "9e4e8da3c58c65123db5f26409759fe1847529f8",
"ddl_id": "b202210f-c26a-49f1-90a1-d99f488d62eb",
"ip": "",
"listening_port": 4000,
"status_port": 10080,
"lease": "45s"
},
"b26ee3e0-8a64-4730-96a2-64d441a0603c": {
"version": "5.7.25-TiDB-v3.0.1",
"git_hash": "9e4e8da3c58c65123db5f26409759fe1847529f8",
"ddl_id": "b26ee3e0-8a64-4730-96a2-64d441a0603c",
"ip": "",
"listening_port": 4000,
"status_port": 10080,
"lease": "45s"
},
"b9de0dc0-3797-4513-90e3-4e5274e3d4e3": {
"version": "5.7.25-TiDB-v3.0.1",
"git_hash": "9e4e8da3c58c65123db5f26409759fe1847529f8",
"ddl_id": "b9de0dc0-3797-4513-90e3-4e5274e3d4e3",
"ip": "",
"listening_port": 4000,
"status_port": 10080,
"lease": "45s"
},
"c9a24ebf-d620-44fd-bf58-1e8e86dd77c6": {
"version": "5.7.25-TiDB-v3.0.1",
"git_hash": "9e4e8da3c58c65123db5f26409759fe1847529f8",
"ddl_id": "c9a24ebf-d620-44fd-bf58-1e8e86dd77c6",
"ip": "",
"listening_port": 4000,
"status_port": 10080,
"lease": "45s"
},
"f73fb8e2-928e-4bfa-a8b3-1844eabb10d0": {
"version": "5.7.25-TiDB-v3.0.1",
"git_hash": "9e4e8da3c58c65123db5f26409759fe1847529f8",
"ddl_id": "f73fb8e2-928e-4bfa-a8b3-1844eabb10d0",
"ip": "",
"listening_port": 4000,
"status_port": 10080,
"lease": "45s"
}
}
}
@billhcmus Do you install this cluster with tidb-ansible? You can finde tidb-ctl in "tidb-ansible/resources/bin".
You can execute "./tidb-ctl etcd ddlinfo --pdhost {PD_IP} --pdport {PD_PORT}". Then you will see "b202210f-c26a-49f1-90a1-d99f488d62eb" in value
of one of the kvs
. In this KV, you can fine the key value like "/tidb/ddl/fg/onwner/xxxx".
Then you can execute the following cmds:
"./tidb-ctl etcd delkey /tidb/ddl/fg/onwner/xxxx --pdhost {PD_IP} --pdport {PD_PORT}"
"./tidb-ctl etcd delkey "/tidb/ddl/all_schema_versions/b202210f-c26a-49f1-90a1-d99f488d62eb --pdhost {PD_IP} --pdport {PD_PORT}"
"./tidb-ctl etcd delkey /tidb/server/info/b202210f-c26a-49f1-90a1-d99f488d62eb --pdhost {PD_IP} --pdport {PD_PORT}".
At last, you can execute "./tidb-ctl etcd ddlinfo --pdhost {PD_IP} --pdport {PD_PORT}" to check if it works.
@billhcmus Could you tell me show many TiDB in your cluster?If only 3 TiDB instances? We need to delete other DDL ID info.
I run only 3 TiDB instances.
We need to delete other DDL ID info
what do you mean?
Delete ddl_id correspond with job_id after job_id=3467 ?
@billhcmus No. I mean we need to delete DDL info like "./tidb-ctl etcd delkey ...". We have only 3 instances in the cluster, but we get 6 server info in the result of "curl http://{TiDBIP}:{TiDBPORT}/info/all". So we need to delete the other 3 server infos. Could you accept restart of PD nodes? If you can accept, we needn't delete DDL info like "./tidb-ctl etcd delkey ...".
How do I know what server info I need to delete? choose 3 servers in 6 servers?
@billhcmus
The DDL ID named 77c6fbd5-060a-4f91-9782-7d68dd6be6cf
, 9a2519ff-1a54-4012-b487-a0bc8d73ef79
and b9de0dc0-3797-4513-90e3-4e5274e3d4e3
are valid, and we can see it through the SELF_ID
field of admin show ddl;
on the 3 servers.
but in the result after exec command: curl http://{TiDBIP}:{TiDBPORT}/info/all
we not have any of: 77c6fbd5-060a-4f91-9782-7d68dd6be6cf
, 9a2519ff-1a54-4012-b487-a0bc8d73ef79
and b9de0dc0-3797-4513-90e3-4e5274e3d4e3
Do you restart these TiDB instances? Could you execute admin show ddl;
and curl http://{TiDBIP}:{TiDBPORT}/info/all
again to make sure there are no other abnormal operations between these two operations?
After check again I found that I have 6 TiDB instances.
Cluster 1: 3 instances, tidb-binlog enable Cluster 2: 3 instances, tidb-binlog enable but firewall rules not enable to pump-client call pump server When Cluster 1 crash because of OOM the owner switch to one instance in Cluster 2 Then I execute DDL and DDL could not be sync to Pump server, then DDL stagnant in the queue.
[2020/02/03 16:50:25.900 +07:00] [INFO] [ddl_worker.go:486] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:3467, Type:drop table, State:running, SchemaState:wr
ite only, SchemaID:3281, TableID:3459, RowCount:0, ArgLen:0, start time: 2020-02-03 16:50:24.51 +0700 +07, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2020/02/03 16:50:41.047 +07:00] [WARN] [client.go:288] ["[pumps client] write binlog to pump failed"] [NodeID=PMA_ZaloPay_DB-527.8:8250] ["binlog type"=Prewrite] ["start ts"=414377161774858243] ["commit ts"=0] [length=91] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
Thank you for your support @zimulala .
Do you mean you have 6 tidb-server instances in one TiDB cluster?
After check again I found that I have 6 TiDB instances.
Cluster 1: 3 instances, tidb-binlog enable Cluster 2: 3 instances, tidb-binlog enable but firewall rules not enable to pump-client call pump server When Cluster 1 crash because of OOM the owner switch to one instance in Cluster 2 Then I execute DDL and DDL could not be sync to Pump server, then DDL stagnant in the queue.
[2020/02/03 16:50:25.900 +07:00] [INFO] [ddl_worker.go:486] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:3467, Type:drop table, State:running, SchemaState:wr ite only, SchemaID:3281, TableID:3459, RowCount:0, ArgLen:0, start time: 2020-02-03 16:50:24.51 +0700 +07, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [2020/02/03 16:50:41.047 +07:00] [WARN] [client.go:288] ["[pumps client] write binlog to pump failed"] [NodeID=PMA_ZaloPay_DB-527.8:8250] ["binlog type"=Prewrite] ["start ts"=414377161774858243] ["commit ts"=0] [length=91] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
Thank you for your support @zimulala .
Do you mean you have 6 tidb-server instances in one TiDB cluster?
Yeah, and finally, after open firewall rules, DDL executed normally. I will close this issue here. Thank you.
Great! Let me check whether there is an alert rule for such a scenario. This could help root cause this issue quickly.
Hi @billhcmus, we are developing a new feature SQL diagnostics
and there is a diagnosis rule DDL
that may boost locate this problem.
@shenli @lonng Wow, it's good to hear that. Hope to release soon.
Bug Report
Create DDL like drop table, create table, detail:
drop table tidb_loader_voucher.checkpoint;
Drop table or create a table successfully.
I use go-sql-driver in my service when execute
drop table tidb_loader_voucher.checkpoint;
this print bellow error log:It sounds some DDL Job in the queue and never be executed.
admin show ddl jobs;
Some job is being stuck. If I add more DDL statement it will never be executed because of that stuck.
Then I try to kill job with id = 3467, this show bellow error:
Then, I execute this command:
This indicates that the
checkpoint
table was deleted, but why the job didn't finish?I think something went wrong at step 7.
tidb-server -V
or runselect tidb_version();
on TiDB)?v3.0.1
Relate report