Adagios extremely slow after upgrade from 1.4 to 1.6.1

finnzi commented 9 years ago

Hi guys.

I upgraded Adagios from 1.4 to 1.6.1 and I am seeing extreme performance difference. I'm wondering what data you want for debugging.

Adagios: 1.6.1 Pynag: 0.9.1

Host: RHEL 6.6 x86_64

nagios.log is ~27M

Logrotation is configured daily.

Adagios is using crapload of CPU, I'm wondering if it has something to do with top alert producers....should I change the rotation to hourly?

palli commented 9 years ago

Are your views all equally slow, or only specific ones ?

I.e. /objectbrowser/ vs. /status/ vs /status/problems

finnzi commented 9 years ago

/status/* is pretty slugghish but the objectbrowser is pretty much as it was before.

palli commented 9 years ago

How many hosts/services are we talking about ?

Can you try completely stopping/starting nagios in case there is a livestatus deadlog somewhere.

service nagios stop
ps -ef | grep nagios  # Very common issue that a ghost remains that blocks the livestatus socket
service nagios start

finnzi commented 9 years ago

Hi,

I'll try that tomorrow.

We have 687 hosts and 5505 services.

finnzi commented 9 years ago

Hi,

Tried to stop/start - no ghosts/zombies remaining.

palli commented 9 years ago

Few questions to troubleshoot this further:

Which process is showing high cpu, is it apache? Just one apache process?
Does it still show high cpu usage even if you are not browsing adagios ?
Views like /status/problems should not be cpu intensive, but they could be slow if other users are hogging the cpu with inefficient queries
In case you are seeing constantly high cpu, it could be some users are running inefficient queries to livestatus, can you try temporarily changing /etc/http.d/conf.d/adagios.conf so that the url /adagios/ is changed to something else ? if the cpu load is caused by browser requests then load should go down once adagios is unavailable.

finnzi commented 9 years ago

Hi,

Yeah, just one apache process.
Well...it's kind of hard for me to check, I have multiple clients connecting. Last night there were a lot fewer clients then for example, right now and I had a single apache process running at ~75% cpu load.
Sure...I'll try to do that sometime later today or in the next few days.

Thanks!

dnewsholme commented 9 years ago

Here's how i fixed it for me.

First my logrotation wasn't working. Your does so you are all good here.
Your log size is pretty large for a daily. How is the logging configured in nagios.cfg? Mine are set as below 2000 services 179 hosts gives a 400k a day log file. log_event_handlers =0 log_external_commands =0 log_host_retries =1 log_initial_states =0 log_notifications=1 log_passive_checks=0 log_rotation_method = d log_service_retries = 1
If you have a lot of users opening the page to view the status (i.e more than 1), you've only got 1 apache process running that will read through the log file to find the top alert producers(this is what is killing the CPU).As this information isn't cached it has to do this for each user who the page refreshes for but it needs to wait for the process to finish for the first user before getting the data for the second. Edit the adagios.conf file for apache. Change the processes line to a value higher than 1, i use 10.

WSGIDaemonProcess adagios user=nagios group=nagcmd processes=10 threads=25

This allows more processes for apache to spawn and thus each user can process the data individually without queueing.

I went from 100% CPU with adagios constantly to a few seconds to process.

finnzi commented 9 years ago

Hi,

Awesome - thanks.

Will try this and report back tomorrow!

Bgrds, FOG

finnzi commented 9 years ago

So far so good!

I've not done any scientific tests but it feels a lot like it was before :)

I'm closing the issue now.

Bgrds, FOG

dnewsholme commented 9 years ago

Glad that worked.

On Wed, 10 Dec 2014 08:25 finnzi notifications@github.com wrote:

So far so good!

I've not done any scientific tests but it feels a lot like it was before :)

I'm closing the issue now.

Bgrds, FOG

— Reply to this email directly or view it on GitHub https://github.com/opinkerfi/adagios/issues/495#issuecomment-66418382.

opinkerfi / adagios

Adagios extremely slow after upgrade from 1.4 to 1.6.1 #495