Closed gavanderhoorn closed 5 years ago
I've seen CPU load oscillate between ~40% and very high. It could be an effect of having cold cache on the new server and coincidentally high robot traffic. If this persists a solution would be to add CPUs (currently we have two).
The changes I've made compared to the old site:
- dockerized deployment - could have added some impact
in my experience there is very little overhead (if any) incurred by deploying using Docker, unless the NAT-based networking is used.
It's been pretty bad again:
@evgenyfadeev: does Askbot have any way to trace performance problems like this?
Edit: it still is pretty bad.
I cannot be the only one having these problems, but apparently I am the only one complaining about it:
I've sent an email requesting increase the CPUs on this machine.
Hm, did we get 0.5 CPUs now? :0
We did double the CPUs to 4, but that did not have an effect! Now doubled the cache RAM let's see now.
Thank you for your feedback.
Seems slightly better, but it's still not what it used to be.
status.ros.org
seems to agree:
I don't know how it is for others (so perhaps this is a networking issue on my side), but I'm frequently waiting multiple tenths of seconds for pages to load, edit boxes to appear, etc.
status.ros.org
looks like Answers is having some issues as well:
Increased cache ram 50% more.
Seemed better at first, but not sure any more:
Situation today:
Could there be some time-aspect to this? Right now it's really slow (40 seconds wait last page I tried to access). This morning it was OK-ish. Sometimes it's instantaneous.
This is on different machines, different internet connections and different OS.
All times and references CEST.
It's getting rather annoying tbh.
Doubled server worker processes, I might repeat this if the load average permits.
If the situation does not improve soon, will move the job queue from redis to rabbitmq to eliminate the possibility of the queue consuming the app cache space.
On Tue, Jul 16, 2019 at 8:43 AM G.A. vd. Hoorn notifications@github.com wrote:
Situation today:
[image: Screenshot_2019-07-16 ROS Status] https://user-images.githubusercontent.com/4550046/61295330-19415f80-a7d8-11e9-8606-21ed92981ff7.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros-infrastructure/answers.ros.org/issues/196?email_source=notifications&email_token=AAAY5AT6JJC2WNI65ZWQMNLP7W7AVA5CNFSM4H2RB7LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2AXEQQ#issuecomment-511799874, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAY5AS27C63E4PWUQ2EVIDP7W7AVANCNFSM4H2RB7LA .
-- Askbot Valparaiso, Chile skype: evgeny-fadeev
@evgenyfadeev: is there no way to trace and see where the bottlenecks are?
Yes looking into this, will get a uwsgitop to monitor the server process and I did see cache filling up and that's why I've now 4x-ed it.
On Tue, Jul 16, 2019 at 10:24 AM G.A. vd. Hoorn notifications@github.com wrote:
@evgenyfadeev https://github.com/evgenyfadeev: is there no way to trace and see where the bottlenecks are?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros-infrastructure/answers.ros.org/issues/196?email_source=notifications&email_token=AAAY5ATABCQ6FK7F3EAPMDTP7XK2VA5CNFSM4H2RB7LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BAV3I#issuecomment-511838957, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAY5ARNKX5REIDGGOU5FIDP7XK2VANCNFSM4H2RB7LA .
-- Askbot Valparaiso, Chile skype: evgeny-fadeev
So far the site has been rather responsive. Almost back to how it was before the upgrade.
It's only been a day though, so let's see how it holds up.
I'm not sure what changed, but yesterday and the day before it was buttery smooth. Today I'm waiting on pages to load again.
I've not seen any more service disruptions or site time-outs so far.
@evgenyfadeev: it would seem the changes you've made to the server config have helped.
As such, closing for now. Will re-open if/when we run into any more problems.
I realise this will perhaps be hard to diagnose as it may be local, but since a couple of weeks (2?) loading times of ROS Answers have gone up significantly. Logging in can take something like up to 30 seconds. Loading a question sometimes times out, other times takes a similar amount of time (30 to 50 seconds).
It's most noticeable beginning of the day (CEST). Right now (2pm CEST) it's OK, but not great either.
Looking at
status.ros.org
seems to show problems with the site around those times as well: