ikzelf / zbxdb

Zabbix database monitoring, the easy and extendable way
GNU General Public License v3.0
94 stars 45 forks source link

Multiple DB's on same machine create flapping alerts #9

Closed garbled1 closed 5 years ago

garbled1 commented 5 years ago

Describe the bug I have a few machines where I run 4-5 oracle databases on the machine, and monitor them all with zbxdb. There are two alerts in particular that are causing issues for me.

1) The rman alerts. If one db has been backed up, but say the other three have not, the autodiscovery runs, and adds for machine "machinename" an item like rman[FULL]. However, all the other db's on the machine also add an rman[FULL], so the first db says "last backup 2 years ago" and the other three spit out zeros, and the alert flaps every 20 minutes..

2) zbxdb cannot connect. If you have multiple db's, or more commonly for me, a db and an ASM instance. When the ASM instance is up, but the db is down (or bad credentials, or whatever), the alert gets alternating data. up/down/up/down/up/down and flaps like mad.

To Reproduce Steps to reproduce the behavior:

  1. Multiple DB's on same server, all wired to same "host" in zabbix
  2. 0.79
  3. oracle 12c
  4. zabbix 4.0.4
  5. NA

Expected behavior The rman and zbxdb cannot connect LLD needs to add some key that ties it to a specific DB or instance, so they don't overwrite one another every time the lld runs, and create flapping alerts

logs NA

monitoring platform: Zabbix 4.0.4 Python 3.5

Additional context NA

ikzelf commented 5 years ago

Hi Tim, I think the problem is that wired all databases to the same host in zabbix. Every database should have it's own zabbix host and it's own zbxdb.{DB}.cfg. In more recent versions of zbxdb I also added zbx_discover_oradbs that does the database discovery for you, if run on unix.

ikzelf commented 5 years ago

Hi Tim, did you manage to get things going for you?

garbled1 commented 5 years ago

So I do have separate cfg files for each db, but I really can't have separate zabbix hosts for them, it more or less undoes my ability to tie it back to the machine for our tracking system...

Is there any way to get it to generate the db instance name as part of the key? Especially for the rman items..

ikzelf commented 5 years ago

You could do that by generating that in the rman.lld query for the discovery and edit the rman query to match that. Also look at the bct file .... It is very different than how I designed it but it can be done. You can customise that for your self. Maybe it is easier to add a hostname to the database metrics? But then again, which hostname? In RAC there will be multiple hostnames, normally one for every instance.... It is possible to have an inventory field filled by an item. Then you can use that inventory field to tie the hostname in your tracking system.

And generating the instance name to be past of the backup keys ..... de backups are database wide, not for an instance. It can be done.

ikzelf commented 5 years ago

Tim, if the only thing you are missing is a way to couple the database back to a server for a ticket system, the easiest is to use an inventory field for that. In that case, make an item and a query that populates that item and edit an inventory field to copy the value of that item. The inventory fields can be used in actions. To make your setup working means changing all queries, the complete template and it gives an unclear meaning of things. Many things are reported on database level, like sizing, backups .... In your setup they would be reported by all instances of the same database which would cause a lot of extra triggers to fire, if you use RAC. Otherwise, it could work.

ikzelf commented 5 years ago

Hi Tim, do you need more help? Are you going to re-write all sql's to generate the ORACLE_SID in your items? (not smart with RAC(duplicate events) Are you going to use the inventory? (easiest)