Open crazycs520 opened 3 months ago
@lcwangchao PTAL
run admin show ddl jobs
to checkout which ddl is cancelled or failed
run
admin show ddl jobs
to checkout which ddl is cancelled or failed
It doesn't matter what DDL cancels or fails, the important thing is that there is a schema version gap that causes the schema cache miss, then cause stale-read query latency 10x spike.
Another possible cause of this problem is when tryLoadSchemaDiffs
loads multiple versions of schema in one time, such as following log:
[2024/05/24 09:00:09.638 +00:00] [INFO] [domain.go:287] ["diff load InfoSchema success"] [currentSchemaVersion=14899] [neededSchemaVersion=14901] [gap=2] ["start time"=3.152629ms] [gotSchemaVersion=14901] [phyTblIDs="[17245,17184]"] [actionTypes="[3,7]"] [diffTypes="[\"create table\",\"add index\"]"]
This is because create table
DDL job and add index
DDL job can can be executed concurrently.
I'd like to point out that in the new design of fast create table, the schema diff for create table could be missing. And we will not have all the schema history eventually.
I'd like to point out that in the new design of fast create table, the schema diff for create table could be missing. And we will not have all the schema history eventually.
you mean tidb_enable_fast_create_table?, it will set diff too, but don't wait it apply
Another possible cause of this problem is when
tryLoadSchemaDiffs
loads multiple versions of schema in one time, such as following log:[2024/05/24 09:00:09.638 +00:00] [INFO] [domain.go:287] ["diff load InfoSchema success"] [currentSchemaVersion=14899] [neededSchemaVersion=14901] [gap=2] ["start time"=3.152629ms] [gotSchemaVersion=14901] [phyTblIDs="[17245,17184]"] [actionTypes="[3,7]"] [diffTypes="[\"create table\",\"add index\"]"]
This is because
create table
DDL job andadd index
DDL job can can be executed concurrently.
I have few questions about this issue
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
High schema cache miss.
Related metrics:
Analyze
The schema version has 585442, 585443, and 585445, but 585444 does not exist, and there's a hole that causes schema cache misses, so TiDB needs to load snapshot schema from TiKV. Related codes are following:
https://github.com/pingcap/tidb/blob/868456d0f2b89e032997c3c07cad633725583e67/infoschema/cache.go#L118
Well, Why does the schema version 585444 not exist? From the current code implementation, since
GenSchemaVersion
uses a separate transaction to generate schema version, butSetSchemaDiff
use another transaction, so ifGenSchemaVersion
transaction succeed, butSetSchemaDiff
transaction fails, then will have schema version gap in this issue.Why use two separate transactions? this change is introduced by PR: support concurrent ddl
https://github.com/pingcap/tidb/blob/5d990c60e4a2928148cde450cd37b38acded2724/pkg/ddl/ddl.go#L448-L453
https://github.com/pingcap/tidb/blob/851e22d0b2b221388a076115ed76720c27b286cf/pkg/ddl/schema_version.go#L358
https://github.com/pingcap/tidb/blob/2abf83ddd50755dc7c3af6d2f2cf669fab29051d/domain/domain.go#L423-L424
2. What did you expect to see? (Required)
No information schema cache miss.
3. What did you see instead (Required)
information schema cache miss then cause many operation which is load snapshot schema for stale-read query.
4. What is your TiDB version? (Required)
nightly: e1a6b1d63399ce06b00132bc55052fb2c4e8a7fd