unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.45k stars 687 forks source link

Temporal Cache - CPU Peaks #742

Open dav009 opened 9 years ago

dav009 commented 9 years ago

We have a temporal cache with the following settings:

items = 7000
block_size = 21846
expires = 900
bitmap=1
ignore_full=1

Once we introduced this cache to our system it eventually/randomly goes to 100% cpu usage, never coming back to usual levels unless uwsgi process is restarted. Removing the cache gets rids of this weird cpu behaviour, so we are pretty sure it is component.

unbit commented 9 years ago

Which uWSGI version ? Have you tried stracing the process with 100% cpu ? Is is the master going to 100% or a specific worker ?

dav009 commented 9 years ago

@unbit the cpu usage is evenly distributed across workers. When stracing it seems they are just waiting inside a mutex. Is there any tool for debugging? providing better output on what could be happening inside the caches?

unbit commented 9 years ago

How much requests per second is this instance managing ? Which OS ? Can you give some detail on how you are using the cache ? While it spikes at 100% is the system still fully working ? How is the load average ? Have you tried turning off auto-expiration ? Which is the cache sweeper frequency ? (it is an heavy task)

dav009 commented 9 years ago
unbit commented 9 years ago

The sweeper is like a "stop the world" GC, it works by locking each cache item and check its expiration. If you have lot of items and lot of requests it could end in a pretty heavy lock contention. And very probably (from your strace) it looks like this is your case. Is LRU a viable solution ? If not, i think the only viable solution for your case would be having the cache_get function to be instructed to check the expiration field and eventually remove it.

dav009 commented 9 years ago

We definitely have a lots of requests. I will give a try to the proposed approach.

Thanks for the quick support,

unbit commented 9 years ago

If you refer to get->check->remove It is something it must be implemented, you know how to do it ?

dav009 commented 9 years ago

@unbit no, I have not taken a look at uwsgi code yet. I was thinking on storing the date as part of the object that I'm caching, and check whether it is valid or not when I try to use it.

unbit commented 9 years ago

If you have the time to test it i would like to make a quick branch exposing this feature (i think it will require no more than a dozens lines)

dav009 commented 9 years ago

awesome! Go ahead please, thanks!

unbit commented 9 years ago

https://github.com/unbit/uwsgi/commits/uwsgi-2.0-cache-lazy-expire

just add lazy_expire=1 (or lazy=1 as a shortcut) to cache2 options, ensure the cache sweeper is not running, and let me know :)

unbit commented 9 years ago

little code update, even if the cache sweeper is running, it ignores "lazy" caches

unbit commented 9 years ago

Another update: https://github.com/unbit/uwsgi/commit/f778e30d48ce6bfef7f5434c2e30b617189bbdf0

the sweeper now runs only if required

dav009 commented 9 years ago

This means once the cache is full, the only way to get free slots would be to get an item which is in the cache and whose time has already expired ?

unbit commented 9 years ago

Yes

unbit commented 9 years ago

Eventually we could run a "stop the world" when the cache is full

unbit commented 9 years ago

Added 3 cache2 options (always in cache-lazy-expire branch):

no_expire=1 force the sweeper thread to not take in account this cache

sweep_on_full=N when a cache is full run a sweep (no more than 1 sweep every N seconds)

clear_on_full=1 completely clear the cache when it is full

unbit commented 9 years ago

Sorry to bother you, did you have time to test the new features ? I am about to release 2.0.8 and i would like to add them too

dav009 commented 9 years ago

Sorry, Up until now I have time to come back to this. So I will try running the followign params: lazy_expire=1, sweep_on_full=1 so this would trigger the following behaviour:

thanks! and sorry for coming back so late

unbit commented 9 years ago

2.0.8 has the patch, so let me know how it works for you

Thanks again

dav009 commented 9 years ago

After two days I got the usual cpu peak :(

I used the following config:

cache2=name=bad_resolve,items=750,blocksize=21846,bitmap=1,ignore_full=1,lazy_expire=1,sweep_on_full=1
unbit commented 9 years ago

i fear clear_on_full will be too much aggressive. or not (if the cache is full it will be completely cleared) ? Btw i would raise the value of sweep_on_full, if you have a high number of non-expiring items, calling the sweep every second could be pretty heavy.

dav009 commented 9 years ago

yeah, my bad, I understood the sweep_on_full differently, I will rise its value.

dav009 commented 9 years ago

tried with sweep_on_full=10 it took longer to go into the cpu issue, but it eventually happened..

unbit commented 9 years ago

So you reach a point where you cannot clear objects anymore for a pretty long amount of time. Honestly i do not see other approaches than fully clearing the cache. Maybe we can add another flag that automatically clears the cache when a sweep round is not able to remove at least 1 item ? (a kind of "emergency fallback")

dav009 commented 9 years ago

let's go for it. Thanks for the quick reply

unbit commented 9 years ago

This patch: https://github.com/unbit/uwsgi/commit/75b4a3cb2823be860df0441c59495eeadcc01616 forces a clear if no item has been removed during the sweep. Let me know how it works, probably in the final release i will enable it only via a flag

dav009 commented 9 years ago

:+1: Deployed the patch, I will leave it for a few days and see how it goes, thanks.

dav009 commented 9 years ago

The patched branch generate peaks as well..

cache2=name=bad_resolve,items=750,blocksize=21846,bitmap=1,ignore_full=1,lazy_expire=1,sweep_on_full=10
unbit commented 9 years ago

Can you strace it to check if the load is generated by locking/unlocking ?

dav009 commented 9 years ago

sure, is there any debug mode ? or should I just strace the process?

unbit commented 9 years ago

just strace the process consuming the cpu