speedb-io / speedb

A RocksDB compliant high performance scalable embedded key-value store
https://www.speedb.io/
Apache License 2.0
921 stars 72 forks source link

Add "getSmallest()" method (and maybe "isEmpty") #709

Open mjsax opened 1 year ago

mjsax commented 1 year ago

In Kafka Streams, we have a few cases for which we use RocksDB as a "priority queue" -- we layout the keys in a specific way that gives us the order we want. To "peek" into the head of the "queue", we need to create an iterator, what is very expensive.

Because iterators are very expensive, we actually do this "peek" less frequently as we would like to, what leads to a few complications and degraded user experience.

If we could do a point-lookup like getSmallest and RocksDB could give us a quick answer (maybe need to return null if the "queue" is empty), it might help us to cleanup our code (and get rid if internal workaround config to "throttle" how often we create an iterator right now), and could help to improve user experience.

Without knowing the code base, it seems not to be impossible to implement a efficient LSM search with no input key, that looks for the smallest key in the LSM, and returns the corresponding key and value (because we don't pass in the key, we would need the key to be returned as well -- it might also be ok if we only get the key, as we could do a consecutive get(key) to retrieve the corresponding value).

To avoid the getSmallest call, it would also be good to have an efficient isEmpty method (of course, if isEmpty would be as expensive as getSmallest it might not help to add isEmpty -- not sure if isEmpty could be implemented more efficiently, ie, without any LSM search)

Of course, if there is an overall better way to use Speedb as a priority queue, happy to follow recommendations.

hilikspdb commented 1 year ago

We can improve GetSmallest. the iterator creation is expensive and there is no need for a heap. If needed we can parallelised it (same as MultiGet) so that disk requests will be async. Please note that in this case I do recommend using a separate CF for each queue (assuming you have limited number of queues) .

Guyme commented 1 year ago

@erez-speedb @ayulas - Please discuss performance testing scenarios

udi-speedb commented 1 year ago

@erez-speedb, @ayulas - As agreed, you should update the issue with performance testing scenarios and associated information.

ayulas commented 1 year ago

@udi-speedb The comment above your comment by @Guyme already point that

udi-speedb commented 1 year ago

@udi-speedb The comment above your comment by @Guyme already point that

Almost, but not quite. Anyway, please update the issue as agreed. Thanks

hilikspdb commented 1 year ago

Here is the performance tests we need to compare: seek with writes with next=0 compared to "get smallest" . Ayelet please add GetSmallest (with writes) tests to db_bench.

udi-speedb commented 1 year ago

@hilikspdb, @ayulas - Are you implementing both getSmallest() and IsEmpty()?

hilikspdb commented 1 year ago

is-empty is mapped into getsmallest which return an element out of range (or return empty data)

hilikspdb commented 1 year ago

We can also add the options "prefix_same_as_starts" which will return "not-found" if there is no object that matches the prefix (@ayulas)

Guyme commented 1 year ago

@ayulas - make sure you add this capability into db_bench as a part of this ticket so @erez-speedb can test it.

ayulas commented 1 year ago

i divided this to 2 phases. phase 1 will be a simple code which will reduce the heap needed and steps while creating iterators to get smallest key+value phase 2 will improve that by not needed the full iterator creation (in phase 1 it will be created internally - the user will not be exposed to it) as similar to the hmap implementation we will use the range of each level as written in md . this is another project but will be very useful

udi-speedb commented 1 year ago

@mjsax Hi, a few preliminary questions regarding your exact requirements (we may provide an API that supports additional capabilities, but would like to verify yours):

hilikspdb commented 1 year ago

here is the case as described . the user would like to use priority queue, meaning get smallest , process, delete. The queue may be identified by a prefix. The current way (new iterator , seek, delete iterator )may take a long time . iterator create and the problem with delete. we would like to see if we can optimise get smallest to do a skip to the next key with data more efficient. this is related to the iterator after massive delete that Or is working on and may also be a base to improvement there (if the result will be good enough)

udi-speedb commented 1 year ago

here is the case as described . the user would like to use priority queue, meaning get smallest , process, delete. The queue may be identified by a prefix. The current way (new iterator , seek, delete iterator )may take a long time . iterator create and the problem with delete. we would like to see if we can optimise get smallest to do a skip to the next key with data more efficient. this is related to the iterator after massive delete that Or is working on and may also be a base to improvement there (if the result will be good enough)

@hilikspdb - Thanks for the clarification. I would still love to get the exact requirements from @mjsax