pingcap / docs

TiDB database documentation. TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://docs.pingcap.com
Other
588 stars 683 forks source link

Expand TiDB glossary to include frequently used abbreviations and explanations #19212

Closed qiancai closed 1 week ago

qiancai commented 3 weeks ago

Currently, the TiDB Glossary doc lacks explanations for some frequently used abbreviations such as PD, DM, and BR.

To enhance clarity and accessibility for users, we propose expanding the TiDB glossary to include commonly used abbreviations alongside existing product-specific terms. This update will help users better understand technical documents and reduce potential confusion, especially for new users.

Plan:

  1. Select key abbreviations that need definitions based on the list of frequently used acronyms.
  2. Provide clear explanations for the selected abbreviations: Each abbreviation will be accompanied by a concise explanation to ensure users understand its meaning and relevance to TiDB.
qiancai commented 3 weeks ago

I’ve created a script to list the 100 most frequently used abbreviations in the following table. I suggest we start by selecting some key abbreviations from this table.

Note:

  • Only a few of the abbreviations in the table have already been covered in the current Glossary doc: TSO, MPP, VPC, MVCC.
  • Certain abbreviations, like SQL, CPU, and ID, are widely understood and don’t require explanations.
Acronym Number of occurrences
SQL 3730
PD 2506
DDL 1974
DM 1908
CPU 925
ID 880
API 756
BR 724
AWS 655
GC 626
JSON 532
DML 519
TLS 464
GLOBAL 462
IP 449
TSO 431
CSV 353
KV 352
MPP 351
SESSION 349
QPS 340
VPC 301
OOM 277
URI 276
TPC 263
TTL 241
PITR 229
GA 222
SHOW 221
CLI 220
CA 216
SST 209
HTTP 204
GCS 187
HTAP 170
CREATE 163
JDBC 156
URL 150
GB 147
NOT 135
OLTP 130
RPC 128
TPS 127
RU 122
TABLE 120
DR 120
MB 116
NULL 111
SSO 108
IAM 106
AI 105
DXF 105
DMS 104
DMR 100
CTE 98
KMS 98
ARN 98
AZ 98
DROP 96
ORM 95
MVCC 90
TS 87
UI 87
SSL 82
IO 80
LDAP 79
GMHDBJD 79
OLAP 76
SSH 75
OPS 75
GTID 74
INDEX 74
FAQ 69
ALTER 69
CF 69
CIDR 65
RDS 65
CDC 62
SET 59
SSD 57
COLUMNS 56
GROUP 55
DB 54
WAL 53
DNS 50
MSP 50
UPDATE 49
MQ 49
TCP 48
NUMA 47
LTS 46
UTF 43
UUID 41
PROXY 41
DECIMAL 41
RESOURCE 40
OIDC 40
PLACEMENT 40
ANALYZE 39
qiancai commented 3 weeks ago

Here, I’ve selected 20 common abbreviations from the previous table as candidates for the Glossary document. @lilin90 and @dveeden PTAL, thanks.

dveeden commented 3 weeks ago

I assume the list is based on things that are all uppercase? Maybe we should add things like PoC, QoS and IdP as well?

qiancai commented 3 weeks ago

I assume the list is based on things that are all uppercase? Maybe we should add things like PoC, QoS and IdP as well?

@dveeden Good catch. Thank you! Here they are. Let's pick some candidates from the following table too.

Acronym Number of occurrences
TiDB 20520
TiKV 4565
MySQL 2573
TiCDC 2168
TiUP 1197
RocksDB 420
PingCAP 228
gRPC 184
IDs 180
vCPU 169
MariaDB 140
FAQs 140
HAProxy 139
ProxySQL 103
MDSvgIcon 103
macOS 92
CentOS 89
OpenAPI 87
RawKV 55
SQLAlchemy 40
DTFile 39
PyMySQL 38
VCore 38
RCUs 33
DBeaver 32
HunDunDM 31
OAuth 29
URLs 28
SQLTools 25
WebUI 24
TypeORM 24
DMLs 23
InnoDB 21
SSDs 19
KVs 18
NVMe 18
RowID 17
OpenAI 17
DBAs 17
CPUs 16
BenchmarkSQL 16
ResolvedTS 16
benCHmark 14
NewSQL 14
ksqlDB 14
HikariCP 14
CheckpointTS 14
mTLS 12
URIs 12
OpenSSL 11
CFs 11
MBps 11
MinIO 11
AmazonRDS 10
DBaaS 10
BackupTS 10
AutoID 10
TiKVs 9
gPRC 9
vCPUs 9
GPTs 9
StartTLS 8
HundunDM 8
DTFiles 7
DTTool 7
GTIDs 7
SSTs 7
RESTful 7
DMFile 6
PostgreSQL 6
ORMs 6
VPCs 6
MyCLI 6
SELinux 6
AskTUG 5
CMSketch 5
KvDB 5
RaftDB 5
PRs 5
PDs 5
OpenJDK 5
MQTh 5
TiEM 5
JSONPath 5
PIDs 5
KEYs 4
CAs 4
NICs 4
PCIe 4
NoSQL 4
kvDB 4
PromQL 4
HAproxy 4
RockDB 2
dveeden commented 3 weeks ago

@qiancai GTID was on your list. However it is already defined in the glossary, just not the general one. Should we merge these? Maybe this is a left-over from when the DM docs where separate?

https://docs.pingcap.com/tidb/stable/dm-glossary#gtid

And same for SST, which is in yet another glossary.

https://docs.pingcap.com/tidb/stable/tidb-lightning-glossary#sst-file

$ find . -name "*glos*md"
./dm/dm-glossary.md
./ticdc/ticdc-glossary.md
./tidb-cloud/tidb-cloud-glossary.md
./tidb-lightning/tidb-lightning-glossary.md
./glossary.md
dveeden commented 3 weeks ago

Many of the mixed case ones are <TERM>s (e.g. KVs) which are a plural form of something we already define or are going to define.

I would suggest to add these:

Any others?

qiancai commented 2 weeks ago

@qiancai GTID was on your list. However it is already defined in the glossary, just not the general one. Should we merge these? Maybe this is a left-over from when the DM docs where separate?

docs.pingcap.com/tidb/stable/dm-glossary#gtid

And same for SST, which is in yet another glossary.

docs.pingcap.com/tidb/stable/tidb-lightning-glossary#sst-file

$ find . -name "*glos*md"
./dm/dm-glossary.md
./ticdc/ticdc-glossary.md
./tidb-cloud/tidb-cloud-glossary.md
./tidb-lightning/tidb-lightning-glossary.md
./glossary.md

Hi @dveeden

My apologies for the delay in responding. It seems we can maintain the DM glossary separately for now, as most terms there pertain specifically to the DM context and merging them directly with the TiDB glossary could potentially cause confusion for some users.

However, to enhance user experience, we could selectively incorporate some general terms from the DM glossary into the TiDB glossary where they add value. What do you think about it?

Another option is to organize these glossary documents together in a way that maintains their entries in their respective tool documentation while adding a few entries to the TOC.md. This would allow users to access all relevant terms without removing the original glossary entries.

    Glossary
    - General Glossary
    - TiDB Lightning Glossary
    - TiDB Data Migration Glossary
    - TiCDC Glossary
dveeden commented 2 weeks ago

@qiancai I think we can keep the current documents, but have a link to the general one in the first paragraph of each of the other docs and then list the other ones in the first paragraph of the general one. Would that be ok?

qiancai commented 2 weeks ago

@dveeden That sounds feasible too. It’s also the most lightweight option.

dveeden commented 2 weeks ago

@qiancai I've done this in 18dcc30b038dde6e36f024161470f5a8a934c5d5