Open XuHuaiyu opened 4 years ago
We should consider not using disk-base map because it has large cost during ramdom disk IO. Consider a method like grace hash join?
/assign
/assign
@XuHuaiyu Do you think LSM-Tree is a good idea for building disk-based map?
@XuHuaiyu Do you think LSM-Tree is a good idea for building disk-based map?
I doubt no. the performance of any disk-based map is horrible because of random IO. We'd better consider another method.
@XuHuaiyu Do you think LSM-Tree is a good idea for building disk-based map?
I doubt no. the performance of any disk-based map is horrible because of random IO. We'd better consider another method.
Level DB is a key-value database based on LSM-tree. LSM-tree can turn random I/O into sequential I/O.
@jianzhiyao @wshwsh12 has an idea, can you describe it here?
I have talked about this with @wshwsh12 ,but I think LSM-tree is a another good solution for this, because even rocksdb base on it.
@jianzhiyao If we use an LSM-tree to store the hash table, we'll have to take a random I/O operation each time when we want to fetch a key-value pair from the hashtable. I doubt that the performance will be bad.
@jianzhiyao If we use an LSM-tree to store the hash table, we'll have to take a random I/O operation each time when we want to fetch a key-value pair from the hashtable. I doubt that the performance will be bad.
first, lsm-tree is used widely in many product enviroment. and for that issue we can only store the key in lsm-tree to improve performance. After all, we can make a benchmark testing for random keys withlsm-tree.
using LSM tree to simulate temporary hash table is horrible for read, each read will come to random io or may scan the whole table.
using LSM tree to simulate temporary hash table is horrible for read, each read will come to random io or may scan the whole table.
Got it.
Feature Request
Is your feature request related to a problem? Please describe:
For SQL like "select count(distinct a) from t", TiDB uses an in-memory
IntSet
/DecimalSet
/StringSet
/ ... to check whether a value is distinct. The underlying implementation ofXSet
is a go map actually, we may need a disk-based map to avoid OOM. After supporting the disk-based map, theXSet
should be spilled over to disk when mem-quota-query in tidb.toml is exceeded.Describe the feature you'd like:
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy: