Revise config defaults for Valkey 8

zuiderkwast commented 3 months ago

@madolson wrote in redis/redis#12191:

The immediate ones that come to mind:

replica-lazy-flush no -> yes

lazyfree-lazy-user-flush no -> yes

There are many problems with running flush synchronously. Even in single CPU environments, the thread managers should balance between the freeing and serving incoming requests. (handle in #913)

We should probably also revisit all of the datastructure limits (set-max-listpack-entries for example), which were probably tuned in a world with CPUs from a decade ago.

The set-max-listpack-entries was discussed here, where it was first added: https://github.com/redis/redis/pull/11290#discussion_r993643256

Ideally we should tune this in some smart way, but I assume nobody will do this, I suggest the following (pseudo-random :rofl:) new defaults:

hash-max-listpack-entries 512 (Keep this default? An entry has key and value so the listpack's size is 1024 elements)
list-max-listpack-size -2 (No suggestion. Any ideas?)
set-max-intset-entries 512 -> 1024
set-max-listpack-entries 128 -> 1024
zset-max-listpack-entries 128 -> 512 (Each entry is element and score, so listpack size is 1024)
hll-sparse-max-bytes 3000 (Keep this?)
stream-node-max-bytes 4096 (Keep this?)

@eduardobr wrote in redis/redis#12747:

repl-backlog-size 1mb is too small in most cases, but at the same time I think the real fix would be to make it dynamic (rough example: set min and max instead, and let redis choose a value in this range over time)

@xuliang0317 wrote in redis/redis#12747:

Now，repl-backlog-size is still 1mb,For most scenarios, this value is too small.Network transmission and bandwidth performance have improved rapidly in more than ten years，Memory is getting cheaper and cheaper. Although the impact is noted in the documentation, most Redis users will directly use the default configuration.In many scenarios, the traffic per second of a single node exceeds 1mb.Especially for businesses facing individual users. I recommend at least updating the configuration to 10mb to improve the rationality of the current repl-backlog-size.

So I suggest repl-backlog-size 1m -> 10m. Handle in #911

soloestoy commented 3 months ago

some suggestions:

maxmemory-clients 0 -> 100%, limit client memory usage to avoid data eviction caused by network read/write buffering, or even machine system OOM.
latency-monitor-threshold 0 -> 100, enable the latency monitoring by default (100ms), to avoid being unable to track after a problem occurs.

zuiderkwast commented 3 months ago

Interesting. @soloestoy can you explain what maxmemory-clients 100% means? Clients and data share the maxmemory limit and are evicted at the same time? If Valkey is used as an LRU cache, maybe we want to evict data first?

Latency monitor, if it has almost no CPU overhead, I'm OK with enabling it by default. Please explain to me that I don't have to worry about this comment in the config file:

# By default latency monitoring is disabled since it is mostly not needed
# if you don't have latency issues, and collecting data has a performance
# impact, that while very small, can be measured under big load.

soloestoy commented 3 months ago

maxmemory-clients 100% means when all clients' memory reach 100% maxmemory, valkey will disconnect some clients, this config doesn't influence data eviction directly.

I have always wished that Redis/Valkey could track different types of memory usage, such as the amount of memory used for storing user data versus the amount of memory utilized for system operation (like clients). This way, users could set independent limits for data and system operation, such as maxmemory-data and maxmemory-system. If the memory for data exceeds maxmemory-data, eviction would be triggered, if the memory for system operation goes beyond maxmemory-system, like too many connections or excessive buffers for example, disconnection would be initiated. This would allow for clearer planning and management, rather than having memory usage by clients and similar operations lead to the eviction of data, as is currently the case.

About latency monitor, enable it is not a big deal.

/* Add the sample only if the elapsed time is >= to the configured threshold. */
#define latencyAddSampleIfNeeded(event, var)                                                                           \
    if (server.latency_monitor_threshold && (var) >= server.latency_monitor_threshold) latencyAddSample((event), (var));

It has almost no impact, because it's difficult to actually reach this threshold, so it's merely a conditional statement. Even if the threshold is truly exceeded, the time it consumes to log this is only at the microsecond level, which is far less than 100ms.

ranshid commented 3 months ago

maxmemory-clients 100% means when all clients' memory reach 100% maxmemory, valkey will disconnect some clients, this config doesn't influence data eviction directly.

I have always wished that Redis/Valkey could track different types of memory usage, such as the amount of memory used for storing user data versus the amount of memory utilized for system operation (like clients). This way, users could set independent limits for data and system operation, such as maxmemory-data and maxmemory-system. If the memory for data exceeds maxmemory-data, eviction would be triggered, if the memory for system operation goes beyond maxmemory-system, like too many connections or excessive buffers for example, disconnection would be initiated. This would allow for clearer planning and management, rather than having memory usage by clients and similar operations lead to the eviction of data, as is currently the case.

About latency monitor, enable it is not a big deal.
/* Add the sample only if the elapsed time is >= to the configured threshold. */
#define latencyAddSampleIfNeeded(event, var)                                                                           \
    if (server.latency_monitor_threshold && (var) >= server.latency_monitor_threshold) latencyAddSample((event), (var));
It has almost no impact, because it's difficult to actually reach this threshold, so it's merely a conditional statement. Even if the threshold is truly exceeded, the time it consumes to log this is only at the microsecond level, which is far less than 100ms.

@soloestoy I also agree that we need to keep some tight memory accounting for the user data. I wonder though if evictions (both clients and data) should only be triggered by that thresholds alone? for example how do we account for memory waste like fragmentation, it is very hard to evaluate the fragmentation ratio of each of the system/data memory and take it into account.

zvi-code commented 3 months ago

maxmemory-clients 100% means when all clients' memory reach 100% maxmemory, valkey will disconnect some clients, this config doesn't influence data eviction directly. I have always wished that Redis/Valkey could track different types of memory usage, such as the amount of memory used for storing user data versus the amount of memory utilized for system operation (like clients). This way, users could set independent limits for data and system operation, such as maxmemory-data and maxmemory-system. If the memory for data exceeds maxmemory-data, eviction would be triggered, if the memory for system operation goes beyond maxmemory-system, like too many connections or excessive buffers for example, disconnection would be initiated. This would allow for clearer planning and management, rather than having memory usage by clients and similar operations lead to the eviction of data, as is currently the case. About latency monitor, enable it is not a big deal.
/* Add the sample only if the elapsed time is >= to the configured threshold. */
#define latencyAddSampleIfNeeded(event, var)                                                                           \
    if (server.latency_monitor_threshold && (var) >= server.latency_monitor_threshold) latencyAddSample((event), (var));
It has almost no impact, because it's difficult to actually reach this threshold, so it's merely a conditional statement. Even if the threshold is truly exceeded, the time it consumes to log this is only at the microsecond level, which is far less than 100ms.
@soloestoy I also agree that we need to keep some tight memory accounting for the user data. I wonder though if evictions (both clients and data) should only be triggered by that thresholds alone? for example how do we account for memory waste like fragmentation, it is very hard to evaluate the fragmentation ration of each of the system/data memory and take it into account?

I agree about memory accounting needs especially for clients, that can consume resource independently of each other and as such you may want to have independent control.

I think that another aspect of memory usage is the transient memory allocations, that are allocated and freed during command execution (processing memory := memory for processing needs), like lua allocation/modules allocation/temp objs and so on, this memory usage can cause memory pressure/swap without being apparent in any data point except the peak memory that only works for all-time memory maximum and not recent.

[On this note, I hoped to raise one day the idea of memory pool isolation. where we use different memory pool for data for clients for processing memory. I have poc'd this idea using jemalloc's private arena. @ranshid, I think this will solve the fragmentation cost association to usage type. The motivation for memory pool is of course of much wider scope as the memory life cycle is very different, so isolating the usages will improve the overall efficiency. Maybe I'll raise a separate issue on this]

ranshid commented 3 months ago

. @ranshid, I think this will solve the fragmentation cost association to usage type.

I agree, depending on the complexity of it :)

enjoy-binbin commented 3 months ago

lets do it in 8.0, i think we should configure the ideal configuration items for users. btw, i want to change latency-monitor-threshold to default 10ms, just like the slowlog's default value.

zuiderkwast commented 3 months ago

@enjoy-binbin what's the current default latency-monitor-threshold? Is there a max size for the latency history? I'm thinking about the users who are not using this feature, that it doesn't eat a lot of memory or CPUs.

enjoy-binbin commented 3 months ago

the current default latency-monitor-threshold is 0 (disable). I see you guys discussed this in here https://github.com/valkey-io/valkey/issues/653#issuecomment-2175590857

the memory one, yean, maybe, or a latency-history-max-len similar to slowlog-max-len? the latency i think is useful like slowlog, i think they can be put together as an analogy.

enjoy-binbin commented 3 months ago

ohh, we already have the history limit:

/* The latency time series for a given event. */
struct latencyTimeSeries {
    int idx; /* Index of the next sample to store. */
    uint32_t max; /* Max latency observed for this event. */
    struct latencySample samples[LATENCY_TS_LEN]; /* Latest history. */
};

#define LATENCY_TS_LEN 160 /* History length for every monitored event. */

zuiderkwast commented 2 months ago

repl-diskless-load disabled -> on-empty-db. Any drawbacks?

enjoy-binbin commented 2 months ago

repl-diskless-load disabled -> on-empty-db. Any drawbacks?

Seems to have no drawbacks.

Speaking of repl-diskless-load, do we consider adding an empty-db option? it perform a flushall at first so that we can take on-empty-db

madolson commented 1 month ago

Speaking of repl-diskless-load, do we consider adding an empty-db option? it perform a flushall at first so that we can take on-empty-db

It's a little dangerous because it increases the risk of data loss on the replica, we just need to document the risk to users.

madolson commented 1 month ago

We'll make issues for each of these changes individually.

enjoy-binbin commented 1 month ago

About latency monitor, enable it is not a big deal.

/* Add the sample only if the elapsed time is >= to the configured threshold. */
#define latencyAddSampleIfNeeded(event, var)                                                                           \
    if (server.latency_monitor_threshold && (var) >= server.latency_monitor_threshold) latencyAddSample((event), (var));

It has almost no impact, because it's difficult to actually reach this threshold, so it's merely a conditional statement. Even if the threshold is truly exceeded, the time it consumes to log this is only at the microsecond level, which is far less than 100ms.

I beside this, i guess the impact is the mstime call, so do we want to change it?

/* Start monitoring an event. We just set the current time. */
#define latencyStartMonitor(var)                                                                                       \
    if (server.latency_monitor_threshold) {                                                                            \
        var = mstime();                                                                                                \
    } else {                                                                                                           \
        var = 0;                                                                                                       \
    }

enjoy-binbin commented 1 month ago

repl-diskless-load disabled -> on-empty-db. Any drawbacks?

ok, i think the only drawback is that in the past, the replica will store the rdb file to disk, and it will rename it to rdb_filename before doing the parse, which means the tmp RDB file also used for persistence.

madolson commented 1 month ago

The next step for the listpack entry configs is to move it to a separate issue and come up with a clear set of instructions for evaluating the tuning. Then we can ask someone from the community to test it.

valkey-io / valkey

Revise config defaults for Valkey 8 #653