ikzelf / zbxdb

Zabbix database monitoring, the easy and extendable way
GNU General Public License v3.0
94 stars 45 forks source link

Request for clarifications #91

Closed gianri closed 2 years ago

gianri commented 3 years ago

Hi @ikzelf, This is not a bug report but a request for a little help. I hope I'm in the right place. Sorry to bother you again, but I need some aid.

For my Oracle DB I created my host "DB_EUSISASST" in Zabbix GUI and linked to it the template "zbxdb_template_v5.xml". Now I'd like to add some custom queries to ZBXDB, but I'm missing some key concepts...

I can see more than 900 items and 700 triggers associated to my host: all of these are the result of the linked template, or also because ZBXDB runs every single file ".cfg" it finds in the folder "$ZBXDB_HOME/etc/zbxdb_checks/oracle", regardless of the exact name?

I thought to create a ".cfg" file containing my Queries and to add it in the path "$ZBXDB_HOME/etc/zbxdb_checks/oracle": how can I create Items/Triggers linked to this new file (linked to my queries)? Do I need to modify the template for getting this?

Which is the difference between items of these types: "Zabbix Trapper" and "Calculated"?

I will have to monitor also an Oracle RAC with more instances: I know I need to create only a host for the physical database, but how I can use my custom specific queries only for a precise instance and other custom queries for another specific instance?

Thank you very much for your availability and for your kindness. Kind regards

ikzelf commented 3 years ago

Hi, the generic queries are found using db_type (oracle), instance_type (rdbms/asm/apx), the version it finds in the connected database (19) and the database role (primary/standby) in finds in the connected database. Site queries can be specified by using the site_checks (site_checks=your_checks) parameter, where you just give a name. zbxdb will then look into the zbxdb_checks subdirectory for your database and fetch your_checks.cfg

smartest is to create a separate template for your site checks and also attach that to your host.

A trapper item is an item like those that are sent by zbxdb (and ultimately zabbix_sender) A calculated item is an item that can use other items and expressions to create new values.

zbxdb.py will query the gv$ dynamic views to find metrics for the full cluster. There is no way to limit some queries to a certain instance and others to an other instance. What is the problem of running all queries using the connected instance? Do you have some tables with very specific access limits or something like that?

gianri commented 3 years ago

Mainly because on a specific DB we need to run some queries on specific tables, as on other DBs we need to run different queries on other specific tables. So, the way is to create a separate template (cloning for example zbxdb_template_v5.xml, just keeping only the structure)? In this case I can understand there is no way to limit some queries to a certain instance.

This could be a restriction respect on what we used to do with an old version of Orabbix: with that solution we could create a Host in Zabbix for every single instance, whether they were on physically different DBs or on the same one. Is it no possible doing the same with ZBXDB? Such as creating a zbxdb.{HostName}.cfg file (in $ZBXDB_HOME/etc) for each instance just modifying the db_url parameter with the right Service_name, for example?

ikzelf commented 3 years ago

The beauty of zbxdb in combination with RAC is that it monitors the full database from a single connection. I am pretty sure that you can monitor everything you want. If you run in plugin databases it might a hassle to make the application tables accessible from the root container. In that case you might need to create an extra config that connects to the application plugin database. Doing so lets you keep the monitoring for the full database. Compared to what you explained about Orabbix you would create configs for different users aiming for different plugin databases. Not for different instances. In my opinion working per instance would create a mess but still, it is possible. In that case you would aim for a vip.

gianri commented 3 years ago

Ok, I'll try to use a single connection to the DB as zbxdb was thought. In this manner I would have in Zabbix only a host for every single physical DB, and so I could have different alarms triggered for different instances running on the same host (same DB). With Orabbix I was used to immediately figure out which instance was involved from the specific name given to every single instance (1 host=1 instance). It's probabily just a metter of habit, or is there something else that I'm missing?

Sorry for another simple question:

I'm already aiming to vip addresses in my zbxdb.DB_EUSISASST.cfg file, while "only" the default template (zbxdb_template_v5.xml) is in use. What advantage/difference would there be using IP Scans instead?


[zbxdb] db_url = (DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=OFF)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=eusisdb-a-vip.MyDomain)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=eusisdb-b-vip.MyDomain)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=MyServiceName)(SERVER=DEDICATED))) username = zabbix password = db_type = oracle db_driver = cx_Oracle instance_type = rdbms role = normal out_dir = $HOME/zbxdb_out site_checks = NONE hostname = DB_EUSISASST checks_dir = etc/zbxdb_checks password_enc = ....

ikzelf commented 3 years ago

less typing, that's all for now. If you only have two nodes, it is simple. With more nodes and a lot of dynamics, a scan listener will make it a bit easier to stay connected (or to quickly reconnect if the connected instance/node went down) In the triggers, if something is instance specific, it will be shown. Sometimes - for example with top blocker - you need to have a bit of knowledge to read it. (it shows the instance# of the top blocker, with the SID and the #of waiters for that session)

gianri commented 3 years ago

Ok, I defined my site_checks parameter and saved my .cfg file in checks_dir (/home/zbxdb/etc/zbxdb_checks/oracle). Zbxdb finds it and seems running it, but writes these errors in zbxdb.DB_EUSISASST.cfg.log:

2021-10-07 11:09:01,317main30_checks changed, reload etc/zbxdb_checks/oracle/my_super_checks.cfg. 2021-10-07 11:09:04,324main20_auto_discovery_1000 run every 1000 minutes ... ... 2021-10-07 11:09:04,331main20_session_act: select count() from gv$session 2021-10-07 11:09:04,332main20_sessions_active: select count() from gv$session where TYPE!='BACKGROUND' and 2021-10-07 11:09:04,621main20_Sleeping for 57 seconds 2021-10-07 11:10:02,000main20_Sleeping for 59 seconds 2021-10-07 11:11:03,927main50_key=super_checks.session_act ZBXDB-2: SQL format error: expect key,value pairs 2021-10-07 11:11:03,957main50_key=super_checks.sessions_active ZBXDB-2: SQL format error: expect key,value pairs 2021-10-07 11:11:03,959main20_Sleeping for 58 seconds 2021-10-07 11:12:02,324main20_Sleeping for 59 seconds 2021-10-07 11:13:01,638main20_Sleeping for 60 seconds 2021-10-07 11:14:02,239main20_Sleeping for 59 seconds 2021-10-07 11:15:01,590main20_Sleeping for 60 seconds ... 2021-10-07 11:16:05,657main50_key=super_checks.session_act ZBXDB-2: SQL format error: expect key,value pairs 2021-10-07 11:16:05,685main50_key=super_checks.sessions_active ZBXDB-2: SQL format error: expect key,value pairs

The content of my checks.cfg is [super_checks] minutes: 05 sessions_active: select count() from gv$session where TYPE!='BACKGROUND' and status='ACTIVE' and username not like 'SYS%' and username is not null session_act: select count() from gv$session

Both queries work fine in toad. What could be the problem in zbxdb?

ikzelf commented 3 years ago

nice. zabbix does need a key and a value pair. So, you seem to want to count active processes. Now you just need to define a key that can be used to store the result in. If you look into examples you will find constructions where the key is part of the query. If you change this example to "select 'active_sessions', count(*) ...... it will work. In this case you need to create an item 'active_sessions' in zabbix that is going to hold the values. The label 'sessions_active' is just used to make it easier to find problems using the log. It is not part of the key and can't be part of the key since some queries will return multiple keys. Look for example into the p_ts query.

gianri commented 3 years ago

Perfect, it worked! So I can create as many items as many queries I write in my .cfg file, or (smartest solution) I can also create a custom template that uses the queries written in the .cfg file and link it to my host. The question is, how can I make the new custom template fetching my .cfg file (my queries)?

I think I'm testing your patience with these simple questions... sorry a lot for this... But if your patience is comparable to your skills then I could ask you endless questions ;-)

Could be ok a template with the content you can see in attach? MyCustomTemplate.txt

ikzelf commented 3 years ago

II saw an email flashing by and no updates here .... There is no such thing as the template is actively getting/fetching/running the queries. zbxdb.py is running queries that look like a good match for your configuration (db_type, instance_type, db_role, version, driver) to this list of queries the site_checks queries will be added. If all is OK, they are executed and the results are pushed to zabbix. zbxdb.py does not know anything about a template that will receive the results, or not.

For you, your own site specific stuff can be in a separate template and link that to the same host as is mentioned in your zbxdb.*.cfg All your items that you expect to be pushed are to be of type trapper. Of course you can also make derived items, like calculated or dependant items but they are managed by zabbix and not fed by zbxdb.py.

As long as your keys that you have in the sql's are also in the template, all will be fine.

Your imagination is the limit. Also the total run time for all the queries. zbxdb.py is most of the time busy sleeping and waiting for the next time to wake up. If your queries cause the total runtime to be longer than 60s, zbxdb will skip a run.

Ah, the update just came into GitHub. Maybe you meant something else for "acrive_sessions" as key but it does not matter, as long as this is what one of your queries generates.

gianri commented 3 years ago

Yes I deleted the message because it seemed there was something wrong writing the content of the template. So I did a second update attaching the template as a file.

Yes, I meant "active_sessions". Is it important the tag Application or I can leave it blank? Request_method must be always POST?

ikzelf commented 3 years ago

Application is just a kind of grouping. It can help you to see some combined data. In v6 application will be gone. No idea where that request_method came from. For this not important.

gianri commented 3 years ago

Hi, the queries ZBXDB run fetching all preloaded .cfg files in zbxdb_checks folder, are similar to the checks that Zabbix runs using the pre-installed Template "DB Oracle by Zabbix Agent 2"? In other words, is it worth installing Zabbix Agent 2 on AIX (that doesn't seem that simple), even if I already have my ZBXDB server running?

ikzelf commented 3 years ago

I did not -yet- study the agent2 option to monitor databases. I don't like to mess around with odbc. I prefer to monitor from a client perspective since in that case the availability is better measured. Especially with RAC environments ..... this would mean having to monitor all instances using their local instances. With a little bad luck they all report to their own host, meaning that they will all generate events for the same issue (example tablespace filled up). With zbxdb you don't run something out of the box, it is for database monitoring a bit better than that. Especially if you do understand databases and know how to write queries. agent2 will no doubt have some things zbxdb can not do but I have yet to see them.

gianri commented 3 years ago

After I got monitored my main DBs each of them having a single instance, now I'm struggling to get a correct monitoring of another DB with several instances running on it. Sorry again for this question, which will certainly be trivial for you. Here is an example:

My physical DB, Oracle RAC, runs on 2 different nodes; On this DB run 3 different instances: Inst1, Inst2, Inst3. So there are 3 ora_pmon processes running on my machine, for each node (ora_pmon_Inst1, ora_pmon_Inst2, ora_pmon_Inst3). Creating only 1 host in Zabbix for the physical DB (for example RACDB) I should have only 1 file named "zbxdb.RACDB.cfg" for the settings of connection, if I'm not wrong. In this file I can write details for connecting to only 1 Instance (1 SERVICE_NAME) with his own combination of zabbix user/pwd, as this in attach. How the example attached can be modified to add details for Inst2, Inst3 and their respective Zabbix user/pwd?

zbxdb.RACDB.txt

ikzelf commented 3 years ago

It looks like RAC is new for you. In that case: you think too much in instances. Since I don't know your configuration it is a bit hard for me to tell what to do but here: db_url = (DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=OFF)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=RacNodeA-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=RacNodeB-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=Inst1)(SERVER=DEDICATED)))

the address list is OK. In connect data you specify a service_name that is a/the database, not an instance.

If you specify an instance, your monitoring will stop as soon as that instance stops responding or goes down. That is why it should be a service that is available in at least one but preferably in all instances. It it is in only one instance, make sure that it is transferred to a remaining instance if one goes down. If the connected instance goes down, zbxdb will try to reconnect and hopefully can continue monitoring via an other instance of your cluster. Normally this reconnect is within seconds.

Since the password is for a database user, the password is the same for all instances because they all serve the same database and the user is defined in the database. If you have pluggable databases connect to the root container with a global account. That way you can discover and monitor all pdb's from within the root container, if needed.

gianri commented 3 years ago

Certainly yes: I'm quite new with DBs, but no more than I'm new in zbxdb ;) I'm working on conventional Oracle RAC DBs (a bit old version: 11.2.0.4), so I don't have PDBs. I probably misunderstood the term "database" and "instance", compared to how I am used to dealing with my colleagues. So, let me describe as best as I can how my setup is and how I would configure zbxdb.

-1 Physical Oracle Database RAC with 2 nodes (2 machines): RacNodeA (AIX OS) and RacNodeB (AIX OS). -3 Databases running on the same physical DB: MyDB1, MyDB2 and MyDB3 -So, for MyDB1 I have a instance running on RacNodeA (MyDB1_instanceA) and another instance running on RacNodeB (MyDB2_instanceB). -MyDB2 and MyDB3 have their own instances, as MyDB1 has.

Is it correct creating the following configuration? -Creating in Zabbix 3 hosts (1 host = 1 Database): MyDB1, MyDB2, MyDB3 -Creating in zbxdb/etc 3 .cfg files: zbxdb.MyDB1.cfg zbxdb.MyDB2.cfg zbxdb.MyDB3.cfg Each of them has its own address list (SERVICE_NAME=MyDBx - where "x" is the number of MyDB: 1 or 2 or 3) -I can have a site check for each of these 3 DBs, creating 3 different custom .cfg files under "etc\zbxdb_checks\oracle" -The zbxdb queries defined in "primary.11.cfg" (my DB version is 11) will be fetched three times

I still apologize if my questions are so basic to you and if my example seemed too long, but I would like to make sure that I understand correctly.

ikzelf commented 3 years ago

Don't worry, if I can, I will help and explain. A long time ago this was all new for me too. But yes, you got it right, apart from the slashes pointing in the wrong direction.

gianri commented 3 years ago

Hi Ikzelf, I'm having performance issues on my DB and I was thinking about minimizing Queries of ZBXDB, so to leave active only the main ones. To reduce the resource load of my target DB, is it enough to disable the items associated with the host on Zabbix? Or should I remove the correspondents Queries from the CFG checks file (primary.11.cfg - for my version)?

ikzelf commented 3 years ago

In latest data you can see the response times of the queries. Most will be sub-second and they will not do any harm for your database. There is one query that can be influenced by a filled recycle bin and that is the one behind p_ts. Most of the times this is solved by purging the dba recycle bin. First check the response times so you know which query could be the/a culprit.

gianri commented 3 years ago

In Latest Data I don't have the Response Time: it could be because my Zabbix version is 5.0 rather than the latest?

ikzelf commented 3 years ago

There should be ..... check application zbxdb with keys like "zbxdb[query,*,ela] (and also status) No matter what version of zabbix you use, this is part of the self monitoring system in zbxdb.

gianri commented 3 years ago

Ok, I found these keys and values: zbxdb[query,checks_05m,obj.stat,ela] 1s 700.69ms zbxdb[query,checks_05m,t_ts,ela] 1s 277.2ms zbxdb[query,checks_60m,p_ts,ela] 1m 9.41ms zbxdb[query,checks_60m,,ela] 1m 728.16ms

This does mean that to decrease the load on my DB I should Disable these Items? Thanks, as always

ikzelf commented 3 years ago

Disabling then in the GUI won't do anything here, except that the items will ignore the data that is sent to them. As I already expected: the culprit is zbxdb[query,checks_60m,p_ts,ela] 1m 9.41ms. It runs every 5 mins. The solution is to purge the dba recycle bin. After that it normally runs sub second again.

a workaround is to move it in primary.11.cfg to checks_60m so it is executed once an hour, or completely remote the query. Doing so takes away your resource monitoring since this query tracks your tablespace usages, in my opinion one of the more useful things to monitor.

How does this item look in a graph? Since when is the ela > 1s?

gianri commented 3 years ago

Looking in graphs, I can see that items "zbxdb[query,checks_05m,obj.stat,ela]" and "zbxdb[query,checks_05m,t_ts,ela]" present different peaks, but only in few occasions are > 1s. Different is for "zbxdb[query,checks_60m,p_ts,ela]" and "zbxdb[query,checks_60m,,ela]": they have always been > 1m (m, not s)

ikzelf commented 3 years ago

It would be smart to check with your local dba about the recycle bin. With a little luck it can be purged and that will solve the issue. When oracle objects are dropped, they end up in the recycle bin and keep using space. Sometimes it is kept as a way to revert a drop action but more often than not, the bin can be purged after a few hours. PURGE DBA_RECYCLEBIN;

gianri commented 2 years ago

Okay, good to know that the dba_recyclebin is the weakness of this query: probably we should schedule the purge once a month, more or less, depending of how many drops... The culprit Query zbxdb[query,checks_60m,p_ts,ela] is that in attach of this message? It is the only one I can see similar (p_ts) in primary.11.cfg, but it is under "minutes 60" section, not under "minutes 5" (if it runs every 5 mins). Where am I wrong? I fully agree with you that the monitoring of tablespace usage is one of the more important things and, for this reason, I'd like to get it run every 5 or 10 mins as maximum, but without worrying about blocking the DB

gianri commented 2 years ago

Sorry... here is the attach CulpritQuery.txt

ikzelf commented 2 years ago

Yes, you are right. Because most permanent tablespaces are quite regular in their growth I have it run once an hour. T_ts is more frequent.

gianri commented 2 years ago

Perfect. I ran an AWR report when my DB was suffering and the Query that was top in the following lists was that in attach:

I can't find this query out in Primary.11.cfg, even if it is zbxdb related because it uses the Zabbix schema. Probably this is another simple question for you, but can you help me to understand ?

QueryInAWR.txt

ikzelf commented 2 years ago

Do I understand correctly that you have combined the zabbix schema in your monitored database? That is a bit different that what I expected. The zabbix database, when used for more than a handful of devices will be busier than most application databases. I will check the query asap.

ikzelf commented 2 years ago

The query you found in AWR is strange for me. Which user is running it and from what machine? zbxdb has a query to find the top blocking session. That is zbxdb[query,checks_01m,blocked,ela] which does a pretty good job in identifying the most problematic session (the one blocking most other sessions). The awr query you found certainly is one that checks for locking but it is not run by zbxdb. If this query is worrying you, make sure you find the source and stop it. zbxdb is very resource friendly.

Did you by any chance also try to deploy one of the original zabbix oracle templates on this database? If that is the case, you should be able to disable the item with this query. That might be the only advantage that the original zabbix database template has: all config in the GUI.

gianri commented 2 years ago

My Zabbix runs on a dedicated machine with its own mysql DB. From the report of "Oracle Tuning Advisor" in Toad about the query found in AWR, I can see that it is related to the schema name "zabbix". My ZBXDB is configured (in the file ZBXDB.MyMonitoredDB.cfg) to connect with the user zabbix. The only template associated in zabbix for my monitored DB is that of zbxdb: ​/home/zbxdb/zbxdb/zbxdb_template_v5.xml Very strange... let me check the AWR of another monitored DB with the same zbxdb/template...

ikzelf commented 2 years ago

Maybe this query comes from cloud control? This query generates a list intended to be read by someone. Can you see from which machine the query comes?

gianri commented 2 years ago

Yes, I confirm, I ran the AWR report on another monitored DB with same template/checks, and I found the same query (it has the same SQL ID) being at the top of the worst lists (wait/cpu/elapsed time). I'm not getting any complaints from users of this last DB, probably because the dedicated machine is more powerful

ikzelf commented 2 years ago

There is one extra thing that zbxdb does when connecting to a database: it sets the module to zbxdb so you can recognise weather a query comes from zbxdb.py or an other tool. Do you have site_checks in your configuration?

gianri commented 2 years ago

Cloud cannot be because we don't have any service in Cloud yet. Monitoring the sessions I see that it is coming from Zabbix machine... but why? In attach you can find my custom check file, the same for all both DB (I changed the extension from .cfg to .txt) my_checks_DBRAC.txt

ikzelf commented 2 years ago

Your custom checks won't be a problem I think. One of the many names is cloud control other names for the same tool are OEM, Oracle Enterprise Manager, CC ..... What is listed in the column 'module' for the sessions running that nice query?

gianri commented 2 years ago

In Module it's written JDBC Thin Client. zabbix in machine, and ZABBIX in username columns

ikzelf commented 2 years ago

So there is some Java app running that query. Could be enterprise manager. zbxdb.py uses the regular sqlnet libraries and identifies itself in the module as zbxdb.py. Maybe change the password for the zabbix user? Someone will start complaining about that.....

ikzelf commented 2 years ago

Could you track down which tool is running the queries?

gianri commented 2 years ago

Hi @ikzelf, I haven't changed zabbix user's password yet, but I'll do it for sure in the next days if nothing comes up. I'll keep you updated. Thanks for all

gianri commented 2 years ago

Hi @ikzelf, the culprit was a old installation of Orabbix that was still running on our old zabbix Server!

ikzelf commented 2 years ago

hah, nice that you could solve this!