jankotek / mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
https://mapdb.org
Apache License 2.0
4.87k stars 873 forks source link

Trying to understand HTreeMap #1030

Closed siddhsql closed 9 months ago

siddhsql commented 10 months ago

the documentation of HTreeMap says:

Unlike other HashMaps it does not use fixed size Hash Table, and does not rehash all data when Hash Table grows

then later on the same page says:

HTreeMap uses Index Tree instead of growing Object[] for its Hash Table. Index Tree is sparse array like structure, which uses tree hierarchy of arrays. It is sparse, so unused entries do not occupy any space. It does not do rehashing (copy all entries to bigger array), but also it can not grow beyond its initial capacity (translation: size of HTreeMap is fixed).

Q.1: this is contradictory. which is correct?

same page then says:

Maximal Hash Table Size is calculated as: segment node size ^ level count. The default maximal Hash Table Size is 816^4= 0.5 million entries

but when I looked at the code, node size seems to be 2^7 = 128 not 16.

int HTREEMAP_CONC_SHIFT = 3;
    int HTREEMAP_DIR_SHIFT = 7;
    int HTREEMAP_LEVELS = 4;

Q.2: what gives? is the doc wrong?

Q.3: further I can see the HTreeMap creates 8 indexTrees but each indexTree is a instance of org.eclipse.collections.api.map.primitive.MutableLongLongMap and is zero size to begin with and not constrained by any initial capacity. Is the documentation totally out of sync with reality?

Q.4: doc:

HTreeMap uses Index Tree instead of growing Object[] for its Hash Table. Index Tree is sparse array like structure, which uses tree hierarchy of arrays.

reality (I don't see any tree hierarchy of arrays. I just see a MutableLongLongMap for each segment (partition)):

val indexTrees: Array<MutableLongLongMap>,

the array is just used for segments e.g., the array size is 8 - 8 index trees for 8 segments.

jankotek commented 9 months ago

You are over-complicating it. Just use db.createHashMap.... and do not set any params you do not understand. Default values are quite ok.