ikzelf / zbxdb

Zabbix database monitoring, the easy and extendable way
GNU General Public License v3.0
94 stars 45 forks source link

How to configure zbx_alertlog.py #45

Closed absz1980 closed 4 years ago

absz1980 commented 4 years ago

Hello, Ronald. I want to configure zbx_alertlog.py. I don`t understand how to do it.

Do I need to install zabbix agent on DB host? What is it installed python version on DB host? Do I need to clone zbxdb projects on DB hosts?

ikzelf commented 4 years ago

Yes, you need the zabbix agent to read the log.xml file. Python needed on dbhost for this.

It sends the discovery list of the found log.xml to the zabbix proxy or server of your choice.

ikzelf commented 4 years ago

Did you get this working?

absz1980 commented 4 years ago

Hello, Ronald. I am just installing zabbix agent on host.

What version python do I install on monitoring host?
Do I have to install python on behalf of zabbix user? Can I use pyenv?

ikzelf commented 4 years ago

Hi, yes, you can use pyenv. The version of python is not very specific but I would just pick a recent v3. It might be smartest to make the script run by a user who has access to all instances and the v$diag_info view. OS authentication is what I would use for this.

absz1980 commented 4 years ago

Hello, Ronald!

zbx_alertlog.py[-h][-o ORATAB][-s SERVERNAME][-p PORT] - H HOSTNAME - k KEY

ORATAB - /etc/oratab SERVERNAME,PORT - Is it zabbix server host,port? HOSTNAME - It is monitoring hostname in zabbix? KEY - What is it? Do I need to install zabbix-sender? Do I need add zbx_alertlog.py to crontab?

ikzelf commented 4 years ago

Good morning,

ORATAB - /etc/oratab SERVERNAME, PORT - Zabbix server/proxy HOSTNAME - the hostname that is going to monitor the log.xml files (the database server (add template zbx_alertlog)) KEY - the key of the discovery rule. If you use the template zbx_alertlog then it is oracle.alertlog.lld

Yes, you need zabbix-sender Need to be in the crontab is not a rule but if you have some dynamics in your environment like adding/removing database instances it would be smart to put it in the crontab.

absz1980 commented 4 years ago

Hello, I run command and retrieve error

[oracle]$ /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld Namespace(hostname='MSK01-AISDB02', key='oracle.alertlog.lld', oratab='/etc/oratab', port='10051', servername='sysmon.uralsibins.ru') Linux new sid AIS Traceback (most recent call last): File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 172, in main() File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 151, in main _sids = get_sids_from_linux(_args.oratab) File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 64, in get_sids_from_linux sids.append([sid, oracle_home]) UnboundLocalError: local variable 'oracle_home' referenced before assignment [oracle@MSK01-AISDB02:AIS_REPORT ~]$ cat /etc/oratab AIS_REPORT:/oracle/product/11.2.0.4:N

absz1980 commented 4 years ago

SQL> show parameter name

NAME TYPE VALUE


cell_offloadgroup_name string db_file_name_convert string db_name string AISREPRT db_unique_name string AISREPRT global_names boolean FALSE instance_name string AIS_REPORT lock_name_space string log_file_name_convert string /oracle/oradata/ais/redo, /ora cle/oradata/FRA/AIS_REPORT/onl inelog processor_group_name string service_names string ais_report SQL>

ikzelf commented 4 years ago

The problem was that you have a "_" in your ORACLE_SID. This is fixed now.

absz1980 commented 4 years ago

Hello, Ronald!

[oracle@ zbxdb]$ git pull remote: Enumerating objects: 3, done. remote: Counting objects: 100% (3/3), done. remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0 Unpacking objects: 100% (3/3), done. From https://github.com/ikzelf/zbxdb a1256b5..37d5edb master -> origin/master Updating a1256b5..37d5edb Fast-forward zbx_alertlog.lld | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) create mode 100644 zbx_alertlog.lld

[oracle@ zbxdb]$ /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld Namespace(hostname='MSK01-AISDB02', key='oracle.alertlog.lld', oratab='/etc/oratab', port='10051', servername='sysmon.uralsibins.ru') Linux new sid AIS Traceback (most recent call last): File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 172, in main() File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 151, in main _sids = get_sids_from_linux(_args.oratab) File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 64, in get_sids_from_linux sids.append([sid, oracle_home]) UnboundLocalError: local variable 'oracle_home' referenced before assignment [oracle@ zbxdb]$ env | grep ORA ORACLE_SID=AIS_REPORT ORACLE_BASE=/oracle PS1=[$green$bold][[$reset$green]\u[$yellow$bold]@[$reset$cyan]\h[$yellow$bold]:[$reset$red]$ORACLE_SID [$blue$bold]\W[$green]]$[$reset] ORACLE_HOME=/oracle/product/11.2.0.4

ikzelf commented 4 years ago

Weird, the changes I made -and tested- are no longer here. Just recovered them. Sorry for that.

absz1980 commented 4 years ago

Hi, Ronald!

The error was not fixed.

[oracle zbxdb]$ git pull remote: Enumerating objects: 4, done. remote: Counting objects: 100% (4/4), done. remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0 Unpacking objects: 100% (4/4), done. From https://github.com/ikzelf/zbxdb 37d5edb..3ae2b00 master -> origin/master Updating 37d5edb..3ae2b00 Fast-forward bin/zbx_alertlog.py | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) [oracle zbxdb]$ /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld Namespace(hostname='MSK01-AISDB02', key='oracle.alertlog.lld', oratab='/etc/oratab', port='10051', servername='sysmon') Linux new sid AIS Traceback (most recent call last): File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 176, in main() File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 155, in main _sids = get_sids_from_linux(_args.oratab) File "/home/oracle/zbxdb/bin/zbx_alertlog.py", line 68, in get_sids_from_linux sids.append([sid, oracle_home]) UnboundLocalError: local variable 'oracle_home' referenced before assignment [oracle zbxdb]$

ikzelf commented 4 years ago

Something funny happens when I move my code to my git environment. I hope it is better now. What should happen is something like ps -ef|grep ora_ and pick the 3rd part of the name (the part after orasmon).... Match that name with the ORACLE_SID in de oratab and grab the ORACLE_HOME to use for the connection to the instance.

absz1980 commented 4 years ago

What's it next?

I see nothing in latest data in zabbix server?

[oracle zbxdb]$ /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld Namespace(hostname='MSK01-AISDB02', key='oracle.alertlog.lld', oratab='/etc/oratab', port='10051', servername='sysmon') Linux new sid AIS_REPORT sid:AIS_REPORT oh:/oracle/product/11.2.0.4 {"data":[{"{#INSTANCE_NAME}": "AIS_REPORT", "{#ALERTLOG}": "/oracle/diag/rdbms/aisreprt/AIS_REPORT/alert/log.xml"}]} zabbix_sender [5254]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 1; total: 1; seconds spent: 0.000129"}] Response from "sysmon": "processed: 0; failed: 1; total: 1; seconds spent: 0.000129" sent: 1; skipped: 0; total: 1

ikzelf commented 4 years ago

Does your zabbix host "MSK01-AISDB02" have the alertlog template attached to it? If so, is there an error in that discovery rule (oracle.alertlog.lld)? The discovery array looks good to me, if this machine indeed has only one running instance on it.

absz1980 commented 4 years ago

I deployed zbx_template_alertlog.xml and attached on MSK01-AISDB02. I run /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld many times. And I see in Latest Data nothing. I seem don`t understand something.

ikzelf commented 4 years ago

Ah, OK. Did you see this: zabbix_sender [5254]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 1; total: 1; seconds spent: 0.000129"}] This means 1 item is sent and that one item also failed. Did you have a run with processed: 1 ?

If you did get a discovery processed, it should have generated alertlog items on your host, or an error message visible in configuration/host/discovery rules.

absz1980 commented 4 years ago

I have

image

ikzelf commented 4 years ago

ah, I forgot about the type of the oracle.alertlog.lld key. Change that to trapper. Initially I had it setup as a user parameter but that did not work so nice with Oracle's ideas about security. (what do you get when you hover above the [I] ?)

absz1980 commented 4 years ago

I changed to Zabbix trapper and ran /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld many times. and image

Initially I had it setup as a user parameter but that did not work so nice with Oracle's ideas about security. (what do you get when you hover above the [I] ?) Unsupported item key

ikzelf commented 4 years ago

Ok. Normally the sender should return without errors. This should be done once, or how often you want. This merely declares the alertlog items.

Those generated items will be of type zabbix active agent. The zabbix agent will begin reading the log.xml after that.

Do you have a zabbix proxy? If so, run a proxy configuration update before discovering the alertlogs again(or just wait an hour)

absz1980 commented 4 years ago

image

Do you have a zabbix proxy?

I don`t have zabbix proxy.

How do I check correct setting? Can I configure with zabbix agent or crontab only?

ikzelf commented 4 years ago

What bugs me is your last image, showing active agent for oracle.alertlog.lld. This should be changed to zabbix trapper in the template. Then unlink and clear the template and link it again. This should leave the discovery key in a valid state. Next run the discovery script. Once. Check the output (hoping for processed: 1) If failed: check the error in the discovery rule.

If no error, you should have your 5 items and 12 triggers.

absz1980 commented 4 years ago

Hi, Ronald! Thanks. It`s work!

Linux new sid AIS_REPORT sid:AIS_REPORT oh:/oracle/product/11.2.0.4 {"data":[{"{#INSTANCE_NAME}": "AIS_REPORT", "{#ALERTLOG}": "/oracle/diag/rdbms/aisreprt/AIS_REPORT/alert/log.xml"}]} zabbix_sender [5118]: DEBUG: answer [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000180"}] Response from "sysmon.uralsibins.ru:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000180" sent: 1; skipped: 0; total: 1

Should I add crontab or configure zabbix agent somehow?

ikzelf commented 4 years ago

Glad that it is working now. By now your log item[s] will be created and the agent started reading the log.xml file, if it can access it. You will need to fiddle with access rights to make the log.xml readable by the agent. So you might want to make the agent member of the group that owns the log.xml file and make sure that this file is readable by the group.

If your environment has some dynamics in it, it would be wise to put the zbx_alertlog.py in the oracle crontab to make sure that new -or removed- instances are recognised.

Check config/host/ oracle application in zabbix for problems with the log item. If no problems exist, check latest data.

absz1980 commented 4 years ago

I check error and alert in is not visible

[oracle~]$ cat test-alert.log msg time='2020-07-09T02:00:00.010+03:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='MSK01-AISDB02' host_addr='10.175.33.122' module='' pid='61518'>

ORA-00600 TEST!!!

/msg [oracle@ ~]$ cat test-alert.log >> /oracle/diag/rdbms/aisreprt/AIS_REPORT/alert/log.xml [oracle@ ~]$ cat test-alert.log >> /oracle/diag/rdbms/aisreprt/AIS_REPORT/alert/log.xml [oracle@ ~]$ /home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon -p 10051 -H MSK01-AISDB02 -k oracle.alertlog.lld Namespace(hostname='MSK01-AISDB02', key='oracle.alertlog.lld', oratab='/etc/oratab', port='10051', servername='sysmon') Linux new sid AIS_REPORT sid:AIS_REPORT oh:/oracle/product/11.2.0.4 {"data":[{"{#INSTANCE_NAME}": "AIS_REPORT", "{#ALERTLOG}": "/oracle/diag/rdbms/aisreprt/AIS_REPORT/alert/log.xml"}]} zabbix_sender [7970]: DEBUG: answer [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000088"}] Response from "sysmon.uralsibins.ru:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000088" sent: 1; skipped: 0; total: 1 [oracle@ ~]$

image

ikzelf commented 4 years ago

Check your alertlog item. I suspect that it is invalid because the agent can not read the log.xml. If it is valid: check latest data and see the lines popping up.

absz1980 commented 4 years ago

Should I add zbx_alertlog.py to crontab or configure zabbix agent somehow?

ikzelf commented 4 years ago

you can put zbx_alertlog.py in the oracle crontab, depending on how often the number of databases changes on your server[s] 1 or more times / day. The zbx_alertlog.py only plays a part in the low level discovery of the log.xml files on a host. It does not do anything with the contents of the log.xml.

For that, log items are generated in your host under de application "oracle". Look for unsupported items and check the reason for being unsupported.

You need to configure the log.xml and the path to it such that the agent (running as zabbix user) can read it.

absz1980 commented 4 years ago

How can I configure the agent? Is it an example?

ikzelf commented 4 years ago

If the agent is running, it is running. Nothing to do there. What is the status of the generated items on your host? Are the valid and enabled?

absz1980 commented 4 years ago

Let me see if I understand this. Agent is running and enabled. Nothing to do there.

The status of the generated items on your host image

Should I add to crontab zbx_alertlog.py 1 time per 10 min only? How can I check(simulate) triggered problems?

ikzelf commented 4 years ago

What is in your log.xml? Your most important item is this regard is AIS_REPORT[alertlog] The lines in your /oracle/diag/rdbms/aisreprt/AIS_REPORT/alert/log.xml that match have matches with (ORA-|time=|ERROR|Dead) will be uploaded to zabbix in the AIS_REPORT[alertlog] item.

Those lines will be visible in latest data.

If you expect that new database instances are introduces multiple times per day, it would make sense to put zbx_alertlog.py in the crontab, for a few times per day. zbx_alertlog.py does not read the log.xml, it only is the tool to discover which log.xml's exist on your host. The active agent is reading the log.xml and is doing so every minute.

absz1980 commented 4 years ago

I got it.

I add to crontab

Cut test alert, image image

How can I create a problem if agent don't receive the latest data after a while? How can I close the problem? Is it manual close only?

ikzelf commented 4 years ago

I would change this "$HOME/log/zbxdb_starter.cron 2>&1" to "$HOME/log/zbx_alertlog.cron 2>&1"

The idea is that the events auto close, after a while. That is why the lines with "time=" are for. This makes sure that every now and then new lines arrive and zabbix can see/reevaluate the age of the event and close if it is older. As long as the error keeps repeating in de log.xml, the event will remain open.

The agent nodata event is a default in the active agent templates. There is no need to set this up for the alertlog monitoring.

absz1980 commented 4 years ago

I would change this "$HOME/log/zbxdb_starter.cron 2>&1" to "$HOME/log/zbx_alertlog.cron 2>&1" Thanks.

I wait more 3 hours. The event hasn`t been auto close yet.

absz1980 commented 4 years ago

Hello, Ronald After 2day I see on one host image

I think the agent doesn`t permission on log.xml file but I check. The permissions are the same as other hosts.

ikzelf commented 4 years ago

If zbx_alertlog.py runs, it does set the access permissions to 744, world and group readable. Is your logitem is valid and enabled, or is it not supported? If it is not supported, hoovering above the status flag will show the reason.

An other option is that your database is very quiet. What are the last few lines in the log.xml? Normally, all my databases are running archivelog mode and I make sure they archive 4 times in an hour, at least. Doing so causes lox.xml entries that are read. Problem is that if a logitem does not receive data, triggers with a time in it can not always evaluate their status. If this is the case, a simple 'alter system switch logfile;' should solve this. If this helps, make sure this happens about every 15m.

ikzelf commented 4 years ago

Did you get this up and running now?

absz1980 commented 4 years ago

[root@MSK01-AISDB01 ~]# su - zabbix su: warning: cannot change directory to /var/lib/zabbix: No such file or directory -bash-4.1$ ls -l /oracle/diag/rdbms/ais/ais/alert/log log_1.xml log.xml -bash-4.1$ ls -l /oracle/diag/rdbms/ais/ais/alert/log.xml -rwxr--r-- 1 oracle oinstall 7608774 Jul 23 14:15 /oracle/diag/rdbms/ais/ais/alert/log.xml -bash-4.1$ tail -f /oracle/diag/rdbms/ais/ais/alert/log.xml

Current log# 3 seq# 19520 mem# 0: /oracle/oradata/ais/redo/redo03.log

<msg time='2020-07-23T14:15:56.448+03:00' org_id='oracle' comp_id='rdbms' client_id='' type='UNKNOWN' level='16' host_id='MSK01-AISDB01' host_addr='10.175.32.235' module='' pid='2288'>

Archived Log entry 19453 added for thread 1 sequence 19519 ID 0x20f66801 dest 1:

^C

[oracle@MSK01-AISDB01:ais ~]$ . $HOME/.bash_profile;/home/oracle/zbxdb/bin/zbx_alertlog.py -o /etc/oratab -s sysmon.uralsibins.ru -p 10051 -H MSK01-AISDB01 -k oracle.alertlog.lld > $HOME/log/zbxdb_alertlog.cron 2>&1 [oracle@MSK01-AISDB01:ais ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Jul 23 14:35:55 2020

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

And

image

absz1980 commented 4 years ago

The issue is resolved during a couple of minutes. Thank you.