Open dormando opened 5 years ago
Honestly after the performance fixes I'd rather just build it in by default by leave it marked experimental. Then we can incrementally improve the memory situation.
note to self: check if it would be crazy for the extstore code to issue flushes inline with the eviction code, or wake and temporarily block on the bg thread where possible.
the code is mainly tuned to "never degrade set performance" but the flush performance is generally a write to RAM anyway, so it might make a better default to at least try so long as the performance drop isn't huge.
On a quick look it might not be too bad...
storage_write()
storage_write()
pulls the tail item, checks if it's valid to throw out (not a HDR item, age is above item_age, etc), then alloc's a HDR item and chucks it.storage_write()
fails while the write buffer is full and flushing to disk. For a while I had multiple write buffers but the code was awful and buggy. Might need to either revisit this or figure out how to at least have two write buffers so they can be swapped. A special return code from exstore_write_request()
might help here.So on the cost of slightly increasing the inline set latency we could have bottoming out just force-flushes and directly reclaims memory. The BG thread could still run to try to keep ahead of things or get kicked into gear when something does bottom out.
This would alleviate a lot of the holes in the flushing algorithm that currently require careful tuning.
Lets put this together for a potential general simplification:
item_allocish()
to get the free memory. Manages if necessary: retries if chunk was within page to move, retries or sleeps if tail locked, etc._nomem
case entirely.next:
item_allocish()
logic can flush to extstore directly if necessary.allocish()
flushing directly covers sets as well.then for the automove algorithm:
-I
size.Pros:
Cons:
The set latency could be amortized slightly by having the lru_maintainer_thread() attempt to keep a couple chunks free per high slab class. Rather than have an entire thread for the extstore flushing still.
Notes:
Wonder if this last idea can be simplified or split up a little more:
... then this gets released... or tested. kills the nomem case and should generally improve things.
then a second change with the:
Then on a burst of writes:
Feel like there's still a super corner case: All memory full, burst of writes bottoms out the spare pool buffer, then extstore needs some memory to allocate headers from.
Solution is more complicated though; maybe two spare page pools. Or, past the minimum spare pool, only small/HDR objects can cause memory to pull from the global pool. That should work actually. Else there's no point to flush-on-evict without some kind of spare memory for HDR objects.
edit: blah.. if a high class doesn't have enough memory for all parallel uploads it'll start OOM'ing. So either A) It can dip into global if we'd OOM because tail locked or B) need to solve the tail lock OOM's first.
Thought/refinement... if there's a clean way to decide during a SET on a high slab class that memory is low, it could evict->flush instead of pull from slabber. That avoids pushing on the global page pool and combined with the recent slab mover changes could work pretty well.
it might even be possible to do a decent pass at this with a single extstore write buffer; if the buffer is flushing to disk fall back to slab allocator temporarily. Then multiple write buffers fixes that/etc.
Still a series of not super quick changes. I'd first go for fixing the nomem case by having the mover require a free chunk passed into it. That's one of the last really dumb corner cases I've left in the system.
More thoughts:
ITEM_HDR
bit and only allocated in specific places, I could use a different or novel memory allocator for them.To explain: If a user tries to throw the same 1mb item into cache from 1,000 clients at the same time, we need 1 gigabyte of memory for all of the data uploads, then they just get rejected one by one after the first one succeeds.
Thus, if we bottom out of memory or otherwise cap "in-flight memory buffers", the sets can be thrown into a side queue (using the same queue system we use for extstore/proxy). As memory frees we pull from that set queue and read data from sockets/etc.
With this we can reliably accept sets by blocking them until resources become available, but not hang the worker threads so they're free to handles reads or process/queue other writes.
Extstore shouldn't be compile gated anymore. This would simplify the code by cutting out ifdef's. There're a few things holding me back..
add_iov()
is used) Note: changes staged in next fix this! lots of branches removed from the hot paths.To the second point: TL;DR: small normal items and small item HDR's take up space in the same slab classes/LRU's. Small items can't/won't be flushed to disk. This has some side effects:
These are non-obvious to a user and may be surprising. Hopefully this issue can be a discussion but I won't be surprised if it's just me talking to myself for a month again :)
Ideas:
Most of these requires making a decision on the importance of a small item vs a HDR item, which is impossible to do generically. For extstore's case the "most generic" we can go is ensuring some space is reserved for large objects so the system doesn't thrash; with perhaps an option allowing for "prioritize using memory to fill extstore" vs "treat all objects equally".
The fewer options/twiddles and better defaults the.. better.