valkey-io / valkey

A flexible distributed key-value datastore that is optimized for caching and other realtime workloads.
https://valkey.io
Other
17.13k stars 641 forks source link

Memory protected (debug) mode #672

Open zvi-code opened 4 months ago

zvi-code commented 4 months ago

The problem/use-case that the feature addresses

As a memory db, we aim to reduce as much as possible the risk of memory corruptions/illegal-memory-access and have fail-fast approach to early detect potential issues, hopefully during testing. In some cases the code defect that causes the corruption exists long before there is an actual visible impact (crash\corrupted data), but because the actual impact requires usage of the corrupted memory before it is set again, we don't see the impact most of the time. When we have a crash, the backtrace will usually point to the code that did the valid access and not the offender, this makes corruptions harder to debug. Looking forward on valkey, the likeliness, of a timing/order dependent corruption, increases as we scale threads usage cross valkey. The proposal is to add ability for valkey to run in a mode that, at some configurable cost of memory&cpu, will help find these issues proactively[by crashing when corruption happens and not on the following valid access and regarding to if such access occur].

Description of the feature

The main suggestion is introduce a configurable running mode that will interleave mportected pages all over the memory allocations from the os (the mmaped extents in jemalloc).

There are several options I am thinking of[considering that mprotect can be applied on pages only):

  1. [interleave protected pages]A configuration would specify the protected memory precent X and than: a. Every additional 2PageSize/X (assuming we want prefix and suffix protection) zallocated memory on the next allocation (call to zmalloc, regardless of requested size) add 2*PageSize (+alignment overhead) to the allocation size and mprotect the first and last PageSize bytes [mprotect both reads and writes] b. Whenever memory is increasing allocate PageSize buffers and set mprotect
  2. [lock memory when application guarantees only read/non access]For memory allocations associated to a key in the main dict, like the key sds, that are not been accessed[prior to lookup call], if memory is larger than PageSize (after alignment), apply mprotect read for as long as it is not accessed for write and so on
  3. Extend read only memory mode to additional places. We can further identify large memory pieces that are modified only at small points in the code and not frequently to add protection, like bucket array of a dict or lp in streams and so on [we can unlock on demand in these cases, but this might be not feasible from perf impact perspective]

Alternatives(incremental) you've considered

Incremental to this are these techniques:

madolson commented 4 months ago

I haven't explored enough ASAN, especially with jemalloc, but I think that at least some of the options that involve application "knowledge" will be relevant even in the presence of asan

Yeah I guess this is my main question. It's hard for me to grasp what new protections your proposal provides on top of what the various sanitizers provide.

zvi-code commented 4 months ago

I haven't explored enough ASAN, especially with jemalloc, but I think that at least some of the options that involve application "knowledge" will be relevant even in the presence of asan

Yeah I guess this is my main question. It's hard for me to grasp what new protections your proposal provides on top of what the various sanitizers provide.

Yea, will take the time to check what is exactly available. I guess that if the only additional valprop is application level memory protection (app knows it is not supposed to write), the effort/complexity is not justified