Closed qiancai closed 1 week ago
I’ve created a script to list the 100 most frequently used abbreviations in the following table. I suggest we start by selecting some key abbreviations from this table.
Note:
- Only a few of the abbreviations in the table have already been covered in the current Glossary doc: TSO, MPP, VPC, MVCC.
- Certain abbreviations, like SQL, CPU, and ID, are widely understood and don’t require explanations.
Acronym | Number of occurrences |
---|---|
SQL | 3730 |
PD | 2506 |
DDL | 1974 |
DM | 1908 |
CPU | 925 |
ID | 880 |
API | 756 |
BR | 724 |
AWS | 655 |
GC | 626 |
JSON | 532 |
DML | 519 |
TLS | 464 |
GLOBAL | 462 |
IP | 449 |
TSO | 431 |
CSV | 353 |
KV | 352 |
MPP | 351 |
SESSION | 349 |
QPS | 340 |
VPC | 301 |
OOM | 277 |
URI | 276 |
TPC | 263 |
TTL | 241 |
PITR | 229 |
GA | 222 |
SHOW | 221 |
CLI | 220 |
CA | 216 |
SST | 209 |
HTTP | 204 |
GCS | 187 |
HTAP | 170 |
CREATE | 163 |
JDBC | 156 |
URL | 150 |
GB | 147 |
NOT | 135 |
OLTP | 130 |
RPC | 128 |
TPS | 127 |
RU | 122 |
TABLE | 120 |
DR | 120 |
MB | 116 |
NULL | 111 |
SSO | 108 |
IAM | 106 |
AI | 105 |
DXF | 105 |
DMS | 104 |
DMR | 100 |
CTE | 98 |
KMS | 98 |
ARN | 98 |
AZ | 98 |
DROP | 96 |
ORM | 95 |
MVCC | 90 |
TS | 87 |
UI | 87 |
SSL | 82 |
IO | 80 |
LDAP | 79 |
GMHDBJD | 79 |
OLAP | 76 |
SSH | 75 |
OPS | 75 |
GTID | 74 |
INDEX | 74 |
FAQ | 69 |
ALTER | 69 |
CF | 69 |
CIDR | 65 |
RDS | 65 |
CDC | 62 |
SET | 59 |
SSD | 57 |
COLUMNS | 56 |
GROUP | 55 |
DB | 54 |
WAL | 53 |
DNS | 50 |
MSP | 50 |
UPDATE | 49 |
MQ | 49 |
TCP | 48 |
NUMA | 47 |
LTS | 46 |
UTF | 43 |
UUID | 41 |
PROXY | 41 |
DECIMAL | 41 |
RESOURCE | 40 |
OIDC | 40 |
PLACEMENT | 40 |
ANALYZE | 39 |
Here, I’ve selected 20 common abbreviations from the previous table as candidates for the Glossary document. @lilin90 and @dveeden PTAL, thanks.
I assume the list is based on things that are all uppercase? Maybe we should add things like PoC, QoS and IdP as well?
I assume the list is based on things that are all uppercase? Maybe we should add things like PoC, QoS and IdP as well?
@dveeden Good catch. Thank you! Here they are. Let's pick some candidates from the following table too.
Acronym | Number of occurrences |
---|---|
TiDB | 20520 |
TiKV | 4565 |
MySQL | 2573 |
TiCDC | 2168 |
TiUP | 1197 |
RocksDB | 420 |
PingCAP | 228 |
gRPC | 184 |
IDs | 180 |
vCPU | 169 |
MariaDB | 140 |
FAQs | 140 |
HAProxy | 139 |
ProxySQL | 103 |
MDSvgIcon | 103 |
macOS | 92 |
CentOS | 89 |
OpenAPI | 87 |
RawKV | 55 |
SQLAlchemy | 40 |
DTFile | 39 |
PyMySQL | 38 |
VCore | 38 |
RCUs | 33 |
DBeaver | 32 |
HunDunDM | 31 |
OAuth | 29 |
URLs | 28 |
SQLTools | 25 |
WebUI | 24 |
TypeORM | 24 |
DMLs | 23 |
InnoDB | 21 |
SSDs | 19 |
KVs | 18 |
NVMe | 18 |
RowID | 17 |
OpenAI | 17 |
DBAs | 17 |
CPUs | 16 |
BenchmarkSQL | 16 |
ResolvedTS | 16 |
benCHmark | 14 |
NewSQL | 14 |
ksqlDB | 14 |
HikariCP | 14 |
CheckpointTS | 14 |
mTLS | 12 |
URIs | 12 |
OpenSSL | 11 |
CFs | 11 |
MBps | 11 |
MinIO | 11 |
AmazonRDS | 10 |
DBaaS | 10 |
BackupTS | 10 |
AutoID | 10 |
TiKVs | 9 |
gPRC | 9 |
vCPUs | 9 |
GPTs | 9 |
StartTLS | 8 |
HundunDM | 8 |
DTFiles | 7 |
DTTool | 7 |
GTIDs | 7 |
SSTs | 7 |
RESTful | 7 |
DMFile | 6 |
PostgreSQL | 6 |
ORMs | 6 |
VPCs | 6 |
MyCLI | 6 |
SELinux | 6 |
AskTUG | 5 |
CMSketch | 5 |
KvDB | 5 |
RaftDB | 5 |
PRs | 5 |
PDs | 5 |
OpenJDK | 5 |
MQTh | 5 |
TiEM | 5 |
JSONPath | 5 |
PIDs | 5 |
KEYs | 4 |
CAs | 4 |
NICs | 4 |
PCIe | 4 |
NoSQL | 4 |
kvDB | 4 |
PromQL | 4 |
HAproxy | 4 |
RockDB | 2 |
@qiancai GTID was on your list. However it is already defined in the glossary, just not the general one. Should we merge these? Maybe this is a left-over from when the DM docs where separate?
https://docs.pingcap.com/tidb/stable/dm-glossary#gtid
And same for SST, which is in yet another glossary.
https://docs.pingcap.com/tidb/stable/tidb-lightning-glossary#sst-file
$ find . -name "*glos*md"
./dm/dm-glossary.md
./ticdc/ticdc-glossary.md
./tidb-cloud/tidb-cloud-glossary.md
./tidb-lightning/tidb-lightning-glossary.md
./glossary.md
Many of the mixed case ones are <TERM>s
(e.g. KVs
) which are a plural form of something we already define or are going to define.
I would suggest to add these:
Any others?
@qiancai GTID was on your list. However it is already defined in the glossary, just not the general one. Should we merge these? Maybe this is a left-over from when the DM docs where separate?
docs.pingcap.com/tidb/stable/dm-glossary#gtid
And same for SST, which is in yet another glossary.
docs.pingcap.com/tidb/stable/tidb-lightning-glossary#sst-file
$ find . -name "*glos*md" ./dm/dm-glossary.md ./ticdc/ticdc-glossary.md ./tidb-cloud/tidb-cloud-glossary.md ./tidb-lightning/tidb-lightning-glossary.md ./glossary.md
Hi @dveeden
My apologies for the delay in responding. It seems we can maintain the DM glossary separately for now, as most terms there pertain specifically to the DM context and merging them directly with the TiDB glossary could potentially cause confusion for some users.
However, to enhance user experience, we could selectively incorporate some general terms from the DM glossary into the TiDB glossary where they add value. What do you think about it?
Another option is to organize these glossary documents together in a way that maintains their entries in their respective tool documentation while adding a few entries to the TOC.md. This would allow users to access all relevant terms without removing the original glossary entries.
Glossary
- General Glossary
- TiDB Lightning Glossary
- TiDB Data Migration Glossary
- TiCDC Glossary
@qiancai I think we can keep the current documents, but have a link to the general one in the first paragraph of each of the other docs and then list the other ones in the first paragraph of the general one. Would that be ok?
@dveeden That sounds feasible too. It’s also the most lightweight option.
@qiancai I've done this in 18dcc30b038dde6e36f024161470f5a8a934c5d5
Currently, the TiDB Glossary doc lacks explanations for some frequently used abbreviations such as PD, DM, and BR.
To enhance clarity and accessibility for users, we propose expanding the TiDB glossary to include commonly used abbreviations alongside existing product-specific terms. This update will help users better understand technical documents and reduce potential confusion, especially for new users.
Plan: