Open dav009 opened 9 years ago
Which uWSGI version ? Have you tried stracing the process with 100% cpu ? Is is the master going to 100% or a specific worker ?
@unbit the cpu usage is evenly distributed across workers. When stracing it seems they are just waiting inside a mutex. Is there any tool for debugging? providing better output on what could be happening inside the caches?
How much requests per second is this instance managing ? Which OS ? Can you give some detail on how you are using the cache ? While it spikes at 100% is the system still fully working ? How is the load average ? Have you tried turning off auto-expiration ? Which is the cache sweeper frequency ? (it is an heavy task)
The sweeper is like a "stop the world" GC, it works by locking each cache item and check its expiration. If you have lot of items and lot of requests it could end in a pretty heavy lock contention. And very probably (from your strace) it looks like this is your case. Is LRU a viable solution ? If not, i think the only viable solution for your case would be having the cache_get function to be instructed to check the expiration field and eventually remove it.
We definitely have a lots of requests. I will give a try to the proposed approach.
Thanks for the quick support,
If you refer to get->check->remove It is something it must be implemented, you know how to do it ?
@unbit no, I have not taken a look at uwsgi code yet. I was thinking on storing the date as part of the object that I'm caching, and check whether it is valid or not when I try to use it.
If you have the time to test it i would like to make a quick branch exposing this feature (i think it will require no more than a dozens lines)
awesome! Go ahead please, thanks!
https://github.com/unbit/uwsgi/commits/uwsgi-2.0-cache-lazy-expire
just add lazy_expire=1 (or lazy=1 as a shortcut) to cache2 options, ensure the cache sweeper is not running, and let me know :)
little code update, even if the cache sweeper is running, it ignores "lazy" caches
Another update: https://github.com/unbit/uwsgi/commit/f778e30d48ce6bfef7f5434c2e30b617189bbdf0
the sweeper now runs only if required
This means once the cache is full, the only way to get free slots would be to get an item which is in the cache and whose time has already expired ?
Yes
Eventually we could run a "stop the world" when the cache is full
Added 3 cache2 options (always in cache-lazy-expire branch):
no_expire=1 force the sweeper thread to not take in account this cache
sweep_on_full=N when a cache is full run a sweep (no more than 1 sweep every N seconds)
clear_on_full=1 completely clear the cache when it is full
Sorry to bother you, did you have time to test the new features ? I am about to release 2.0.8 and i would like to add them too
Sorry, Up until now I have time to come back to this.
So I will try running the followign params: lazy_expire=1, sweep_on_full=1
so this would trigger the following behaviour:
thanks! and sorry for coming back so late
2.0.8 has the patch, so let me know how it works for you
Thanks again
After two days I got the usual cpu peak :(
I used the following config:
cache2=name=bad_resolve,items=750,blocksize=21846,bitmap=1,ignore_full=1,lazy_expire=1,sweep_on_full=1
i fear clear_on_full will be too much aggressive. or not (if the cache is full it will be completely cleared) ? Btw i would raise the value of sweep_on_full, if you have a high number of non-expiring items, calling the sweep every second could be pretty heavy.
yeah, my bad, I understood the sweep_on_full
differently, I will rise its value.
tried with sweep_on_full=10
it took longer to go into the cpu issue, but it eventually happened..
So you reach a point where you cannot clear objects anymore for a pretty long amount of time. Honestly i do not see other approaches than fully clearing the cache. Maybe we can add another flag that automatically clears the cache when a sweep round is not able to remove at least 1 item ? (a kind of "emergency fallback")
let's go for it. Thanks for the quick reply
This patch: https://github.com/unbit/uwsgi/commit/75b4a3cb2823be860df0441c59495eeadcc01616 forces a clear if no item has been removed during the sweep. Let me know how it works, probably in the final release i will enable it only via a flag
:+1: Deployed the patch, I will leave it for a few days and see how it goes, thanks.
The patched branch generate peaks as well..
cache2=name=bad_resolve,items=750,blocksize=21846,bitmap=1,ignore_full=1,lazy_expire=1,sweep_on_full=10
Can you strace it to check if the load is generated by locking/unlocking ?
sure, is there any debug mode ? or should I just strace the process?
just strace the process consuming the cpu
We have a temporal cache with the following settings:
Once we introduced this cache to our system it eventually/randomly goes to 100% cpu usage, never coming back to usual levels unless uwsgi process is restarted. Removing the cache gets rids of this weird cpu behaviour, so we are pretty sure it is component.