valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.75k stars 673 forks source link

Valkey on Flash plans? (leverage nvme disk storage) #553

Closed gnat closed 6 months ago

gnat commented 6 months ago

Would be amazing to get a Valkey on Flash going now that Redis enterprise is no longer a factor.

SSDB exists soley to add this feature to a Redis-like alternative.

It's a major reason KeyDB exists too (and to leverage more cores).

Would be a natural progression for Valkey.

Any plans in the works? Thanks.

zuiderkwast commented 6 months ago

It's not trivial to implement as it affects the core of how data is stored. Now, the keys are basically in one huge hash table.

I'd like to see a high-level description of how such implementation would look like before I decide whether it can be accepted or not.

There was a feature for this in the very early days (Redis 2 or so) which was called Virtual Memory. It was removed.

madolson commented 6 months ago

There was a feature for this in the very early days (Redis 2 or so) which was called Virtual Memory. It was removed.

Yeah, naively mapping disk to memory doesn't work very well, since you see huge latency spikes when you have a virtual memory miss and need to fetch the memory page from disk. You can theoretically hide that if we were multi-threaded, since other threads would continue to get scheduled, but our single threaded architecture get's hurt too much by it.

A virtual memory like approach could work though if we built it ourself in userland. We could make a pretty minor change to the main hash-table to indicate that a key is "in-memory" vs "on-disk". Before executing a command, we can check if the command is in-memory, and if it is we execute the command normally. If it's on-disk, we can do one of two things:

  1. Implement logic to go execute the command for an on-disk operation. I think this is similar to what Zhao mentioned with rocksdb in other threads.
  2. We fetch the data into memory, and once it's there we execute the command as normal.

We would need a way to spill items to disk as well.

zuiderkwast commented 6 months ago

Right, I forgot we already discussed this in Valkey, here: https://github.com/valkey-io/valkey/issues/83#issuecomment-2029099358.

With the "on-disk" flag per key, the key's name still consumes memory. I have another idea: We use a probabilistic filter for on-disk keys. If the key is not found in memory (main hash table) and the feature is enabled, then we check the probabilistic filter. If we have a match, we go and fetch the key from disk. This can allow a larger number of small keys on disk that what we even want to store metadata for in memory.

We can use new maxmemory policies for this. Instead of evicting, we move a key to disk.

If we implement some module API for these actions (evict hook, load missing key hook), then the glue to rocksdb or another storage backend can be made pluggable.

lukepalmer commented 6 months ago

I'm interested in being involved, and actually previously sketched out an approach with an on-disk flag per key.

madolson commented 6 months ago

I'm going to close this and move my comment over to #83.

Feel free to post a draft design there.

zuiderkwast commented 6 months ago

Then I'll move my comment too. :)