memcached / memcached

memcached development tree
https://memcached.org
BSD 3-Clause "New" or "Revised" License
13.51k stars 3.28k forks source link

Path to improving extstore's defaults #541

Open dormando opened 5 years ago

dormando commented 5 years ago

Extstore shouldn't be compile gated anymore. This would simplify the code by cutting out ifdef's. There're a few things holding me back..

To the second point: TL;DR: small normal items and small item HDR's take up space in the same slab classes/LRU's. Small items can't/won't be flushed to disk. This has some side effects:

These are non-obvious to a user and may be surprising. Hopefully this issue can be a discussion but I won't be surprised if it's just me talking to myself for a month again :)

Ideas:

Most of these requires making a decision on the importance of a small item vs a HDR item, which is impossible to do generically. For extstore's case the "most generic" we can go is ensuring some space is reserved for large objects so the system doesn't thrash; with perhaps an option allowing for "prioritize using memory to fill extstore" vs "treat all objects equally".

The fewer options/twiddles and better defaults the.. better.

dormando commented 4 years ago

Honestly after the performance fixes I'd rather just build it in by default by leave it marked experimental. Then we can incrementally improve the memory situation.

dormando commented 4 years ago

note to self: check if it would be crazy for the extstore code to issue flushes inline with the eviction code, or wake and temporarily block on the bg thread where possible.

the code is mainly tuned to "never degrade set performance" but the flush performance is generally a write to RAM anyway, so it might make a better default to at least try so long as the performance drop isn't huge.

On a quick look it might not be too bad...

So on the cost of slightly increasing the inline set latency we could have bottoming out just force-flushes and directly reclaims memory. The BG thread could still run to try to keep ahead of things or get kicked into gear when something does bottom out.

This would alleviate a lot of the holes in the flushing algorithm that currently require careful tuning.

dormando commented 4 years ago

Lets put this together for a potential general simplification:

next:

then for the automove algorithm:

Pros:

Cons:

The set latency could be amortized slightly by having the lru_maintainer_thread() attempt to keep a couple chunks free per high slab class. Rather than have an entire thread for the extstore flushing still.

Notes:

dormando commented 3 years ago

Wonder if this last idea can be simplified or split up a little more:

... then this gets released... or tested. kills the nomem case and should generally improve things.

then a second change with the:

Then on a burst of writes:

dormando commented 3 years ago

Feel like there's still a super corner case: All memory full, burst of writes bottoms out the spare pool buffer, then extstore needs some memory to allocate headers from.

Solution is more complicated though; maybe two spare page pools. Or, past the minimum spare pool, only small/HDR objects can cause memory to pull from the global pool. That should work actually. Else there's no point to flush-on-evict without some kind of spare memory for HDR objects.

edit: blah.. if a high class doesn't have enough memory for all parallel uploads it'll start OOM'ing. So either A) It can dip into global if we'd OOM because tail locked or B) need to solve the tail lock OOM's first.

dormando commented 2 years ago

Thought/refinement... if there's a clean way to decide during a SET on a high slab class that memory is low, it could evict->flush instead of pull from slabber. That avoids pushing on the global page pool and combined with the recent slab mover changes could work pretty well.

it might even be possible to do a decent pass at this with a single extstore write buffer; if the buffer is flushing to disk fall back to slab allocator temporarily. Then multiple write buffers fixes that/etc.

Still a series of not super quick changes. I'd first go for fixing the nomem case by having the mover require a free chunk passed into it. That's one of the last really dumb corner cases I've left in the system.

dormando commented 7 months ago

More thoughts:

To explain: If a user tries to throw the same 1mb item into cache from 1,000 clients at the same time, we need 1 gigabyte of memory for all of the data uploads, then they just get rejected one by one after the first one succeeds.

Thus, if we bottom out of memory or otherwise cap "in-flight memory buffers", the sets can be thrown into a side queue (using the same queue system we use for extstore/proxy). As memory frees we pull from that set queue and read data from sockets/etc.

With this we can reliably accept sets by blocking them until resources become available, but not hang the worker threads so they're free to handles reads or process/queue other writes.