opinkerfi / adagios

Adagios - Web Based Nagios Configuration
GNU Affero General Public License v3.0
330 stars 75 forks source link

Adagios extremely slow after upgrade from 1.4 to 1.6.1 #495

Closed finnzi closed 9 years ago

finnzi commented 9 years ago

Hi guys.

I upgraded Adagios from 1.4 to 1.6.1 and I am seeing extreme performance difference. I'm wondering what data you want for debugging.

Adagios: 1.6.1 Pynag: 0.9.1

Host: RHEL 6.6 x86_64

nagios.log is ~27M

Logrotation is configured daily.

Adagios is using crapload of CPU, I'm wondering if it has something to do with top alert producers....should I change the rotation to hourly?

palli commented 9 years ago

Are your views all equally slow, or only specific ones ?

I.e. /objectbrowser/ vs. /status/ vs /status/problems

finnzi commented 9 years ago

/status/* is pretty slugghish but the objectbrowser is pretty much as it was before.

palli commented 9 years ago

How many hosts/services are we talking about ?

Can you try completely stopping/starting nagios in case there is a livestatus deadlog somewhere.

service nagios stop
ps -ef | grep nagios  # Very common issue that a ghost remains that blocks the livestatus socket
service nagios start
finnzi commented 9 years ago

Hi,

I'll try that tomorrow.

We have 687 hosts and 5505 services.

finnzi commented 9 years ago

Hi,

Tried to stop/start - no ghosts/zombies remaining.

palli commented 9 years ago

Few questions to troubleshoot this further:

finnzi commented 9 years ago

Hi,

  1. Yeah, just one apache process.
  2. Well...it's kind of hard for me to check, I have multiple clients connecting. Last night there were a lot fewer clients then for example, right now and I had a single apache process running at ~75% cpu load.
  3. Sure...I'll try to do that sometime later today or in the next few days.

Thanks!

dnewsholme commented 9 years ago

Here's how i fixed it for me.

  1. First my logrotation wasn't working. Your does so you are all good here.
  2. Your log size is pretty large for a daily. How is the logging configured in nagios.cfg? Mine are set as below 2000 services 179 hosts gives a 400k a day log file. log_event_handlers =0 log_external_commands =0 log_host_retries =1 log_initial_states =0 log_notifications=1 log_passive_checks=0 log_rotation_method = d log_service_retries = 1
  3. If you have a lot of users opening the page to view the status (i.e more than 1), you've only got 1 apache process running that will read through the log file to find the top alert producers(this is what is killing the CPU).As this information isn't cached it has to do this for each user who the page refreshes for but it needs to wait for the process to finish for the first user before getting the data for the second. Edit the adagios.conf file for apache. Change the processes line to a value higher than 1, i use 10.

WSGIDaemonProcess adagios user=nagios group=nagcmd processes=10 threads=25

This allows more processes for apache to spawn and thus each user can process the data individually without queueing.

I went from 100% CPU with adagios constantly to a few seconds to process.

finnzi commented 9 years ago

Hi,

Awesome - thanks.

Will try this and report back tomorrow!

Bgrds, FOG

finnzi commented 9 years ago

So far so good!

I've not done any scientific tests but it feels a lot like it was before :)

I'm closing the issue now.

Bgrds, FOG

dnewsholme commented 9 years ago

Glad that worked.

On Wed, 10 Dec 2014 08:25 finnzi notifications@github.com wrote:

So far so good!

I've not done any scientific tests but it feels a lot like it was before :)

I'm closing the issue now.

Bgrds, FOG

— Reply to this email directly or view it on GitHub https://github.com/opinkerfi/adagios/issues/495#issuecomment-66418382.