Open wanghenshui opened 6 months ago
memcomparable也有两种设计
[group1][marker1]…[groupN][markerN]
group 是补零之后的8字节切片
markder = 0xFF - 补零数量
举例:
[] -> [0, 0, 0, 0, 0, 0, 0, 0, 247]
[1, 2, 3] -> [1, 2, 3, 0, 0, 0, 0, 0, 250]
[1, 2, 3, 0] -> [1, 2, 3, 0, 0, 0, 0, 0, 251]
[1, 2, 3, 4, 5, 6, 7, 8] -> [1, 2, 3, 4, 5, 6, 7, 8, 255, 0, 0, 0, 0, 0, 0, 0, 0, 247]
https://haxisnake.github.io/2020/11/06/TIDB%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0-%E5%9F%BA%E6%9C%AC%E7%B1%BB%E5%9E%8B%E7%BC%96%E8%A7%A3%E7%A0%81%E6%96%B9%E6%A1%88/
2 myrocks
varchar类型为了节省空间处理起来就复杂多了 以源码中的注释为例
const int VARCHAR_CMP_LESS_THAN_SPACES = 1; const int VARCHAR_CMP_EQUAL_TO_SPACES = 2; const int VARCHAR_CMP_GREATER_THAN_SPACES = 3;
Example: if fpi->m_segment_size=5, and the collation is latin1_bin:
'abcd\0' => [ 'abcd'
字符串以m_segment_size分段存储,每段前m_segment_size-1个字符是内容,最后一个字符表示和空格比较,VARCHAR_CMP_EQUAL同时也表示字符串结束
例子中m_segment_size为5,实际实现上值为9
这里unpace_info会比较复杂,字符串collation不同unpace_info也不同,unpace_info需要保存collation之间的转换映射关系, 具体可以查看函数(rdb_init_collation_mapping)
https://developer.aliyun.com/article/62648
memcomparable vs kvrocks encoding vs pika encoding,需要搞个demo
meta信息和具体数据分散,数据局部性可能不好
blobdb更是加重了这种局部性对影响
测试角度