threefoldtech / 0-db

Fast write ahead persistent redis protocol key-value store
Apache License 2.0
39 stars 10 forks source link

libzdb: rewrite index memory handling #127

Open maxux opened 2 years ago

maxux commented 2 years ago

Previous index memory management was basically one large array of 128 MB and each block of 8 bytes was a pointer to a linked-list of entries.

Each entry was the object entry with an extra link for the namespace because all namespace were sharing the same master-array.

To find the correct list index, the crc32 of the key were used and the first 24 bits was the array-id, this method was really fast but needed to maintains a unique full large shared buffer. When removing a namespace, deleting process could be very long to iterate over all keys (for all namespaces) even if namespace contains few keys.

Real memory usage was quite effcient actually, thanks Linux, only effective used pages was allocated, so the large 128 MB array was dynamically allocated by the kernel but on really large database, limit can quickly come and sharing the array was anyway not a good idea IMO.

Here is a new implementation, in test. This new implementation use a dedicated index map per namespace, the namespace pointer per entry is thus not needed anymore, it's more logic and less error prone, no sharing across namespace anymore.

When deleting a namespace, the complete isolated map can be free'd without interfering with other keys/namespace.

There is still a crc32 computing on the key and that value will point to an array entry aswell, but this array become multi-level and levels are allocated dynamically. Each level is composed of an array of 16 pointers. This version use 5 level (20 bits out of 32). This is configurable.

Test were made on 2 millions keys namespace, memory usage can be reduced up to 6 times than before and is more predictible (eg: 2 times the same namespace will use 2 times the memory). More test to come, performance doesn't seems to be affected a lot, but it's slower (lookup and allocation are slower, but difference is not really noticable).