wanghenshui / wanghenshui.github.io

my blog, please do not fork
https://wanghenshui.github.io
Other
4 stars 1 forks source link

TSIDs strike the perfect balance between integers and UUIDs for most databases #108

Closed wanghenshui closed 7 months ago

wanghenshui commented 9 months ago

https://www.foxhound.systems/blog/time-sorted-unique-identifiers/

自增bigint 优点 有序,可预测(CPU prefetch),数量大,表征时间,类似create_at,可读性好,方便看/定位 缺点,坦克问题:被摸到数据信息(简单hash混淆一下),多机时序问题(id不同)

UUID UUIDv4 8-4-4-4-12 优点 随机 自带混淆,跨系统 缺点 随机,index无法预测从而变成memory bound,体积大,浪费, 可读性差(简单hash简化一下?)

好消息,UUIDv7 https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#v7 UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds seconds since midnight 1 Jan 1970 UTC, leap seconds excluded. As well as improved entropy characteristics over versions 1 or 6. Implementations SHOULD utilize UUID version 7 over UUID version 1 and 6 if possible.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TSID 结合snowflake ID和ULID

ULID能排序,但不递增

Snowflake递增,但不严格排序

有着snaoflakeid一样的问题,70年

一个实现,https://github.sheincorp.cn/f4b6a3/tsid-creator

支持内置base32编码支持 https://www.crockford.com/base32.html 可读性好

横向对比

Feature Auto-incr. Integers UUIDs TSIDs
Key Type Variable size integer 128-bit integer 64-bit integer
Uniqueness Unique within a database Universally unique Unique across nodes
Predictability Predictable sequence Unpredictable Unpredictable
Space Efficiency High (small size) Low(large size) Moderate(larger than integers but smaller than UUIDs)
Data locality High(sequential increment) Low(random order) High(time-sorted with random component)
Performance High(efficient indexing, inserts, reads) Poor(inefficient inserts, scattered indexes, read penalty) High(similar to integers)
Readability High(simple numbers) Low(32 character strings) Moderate(13 character strings)
Chronological Sorting Yes, implicit(based on sequence) No inherent order Yes, time-sorted(based on time component)
Multi-node Generation Not feasible Easily feasible Feasible with node IDs
Security (Inference Risk) High(German Tank Problem) Low(no inference) Low(no inference)
Ease of Implementation High(natively supported) Moderate(varies by database) Low(least support, requires function implementation, managing node IDs)
Scalability Varies(limited by integer type) High(no practical limit) High(at least ~70 years, limited by timestamp size)
Migration Flexibility Moderate(can change to larger integer type) Low(hard to change key type) High(drop-in compatible with integers)
wanghenshui commented 9 months ago

https://www.infoq.cn/article/wechat-serial-number-generator-architecture

https://tech.meituan.com/2017/04/21/mt-leaf.html