jasonmcintosh / rabbitmq-zabbix

Zabbix RabbitMQ Configuration
Apache License 2.0
251 stars 168 forks source link

No data received on queue //aliveness-test in 10 minutes. #75

Closed PDVJAM closed 6 years ago

PDVJAM commented 6 years ago

Hi! Sometimes I get this error and it seems that rabbitmq is OK at these times and it's something with zabbix/monitoring. Have you seen such an error or maybe you know what to check/how to fix this?

jasonmcintosh commented 6 years ago

Hrmm, not seen this personally. COULD be concurrency issue on the agent side. Wouldn't expect that but sorta grasping here. Do any of the other queues/metrics not report data? Server utilization numbers? I'd lean towards something blocking the agent from delivering metric data. That check is there in case the remote agent stops working or doesn't do it's job for some reason.

PDVJAM commented 6 years ago

Yep, maybe on the server side, I see this in logs:

  9901:20170928:153613.119 active check "rabbitmq[queues]" is not supported: Timeout while executing a shell script.
  9901:20170928:153613.119 In process_value() key:'cricketmgr.com:rabbitmq[queues]' value:'Timeout while executing a shell script.'
  9901:20170928:153613.169 JSON before sending [{"request":"agent data","data":[{"host":"cricketmgr.com","key":"rabbitmq[queues]","value":"Timeout while executing a shell script.","state":1,"clock":1506605773,"ns":119491547}],"clock":1506605773,"ns":169130145}]

But the rabbitmq check are the only have such an issue. Still investigating...

jasonmcintosh commented 6 years ago

That basically means the zabbix agent is taking too long to execute the checks. This isn't the aliveness per se, but the entire set of queue checks. This means that the script that queries rabbitmq via the mocha API's is taking too long. Could be it ran out of memory, could be something else. You SHOULD be able to enable logging in the agent, and have it log to a file of your choosing and see output.

jasonmcintosh commented 6 years ago

Note, zabbix agent has a default timeout for checks - e.g. 1 minute. Though you'd have to really really really have a slow responses or something else going on to hit that timeout. But can't rule it out.

jasonmcintosh commented 6 years ago

Closing this issue as been a while - feel free to re-open as needed :) Would be curious to hear what you found out!

probert-mtv commented 6 years ago

We have a similar issue, same notification message, but not the same message in Zabbix's server logs.

27031:20180329:081434.606 item "rabbitmqxxx-x:rabbitmq[queues]" became not supported: ZBX_NOTSUPPORTED
21069:20180329:082102.026 item "rabbitmqxxx-x:rabbitmq[queues]" became supported
jasonmcintosh commented 6 years ago

You can tweak the scripts to add debug information. There's logging variables and logs that often provide more detailed information on any errors.

miksonx commented 5 years ago

Anyone have Idea how to fix and what might be the issue getting the "No data received on queue //aliveness-test in 20 minutes"

Here is my DEBUG log: 2019-02-05 16:25:39,763 DEBUG: Started trying to process data 2019-02-05 16:25:39,763 DEBUG: Issue a rabbit API call to get data on overview against app01 2019-02-05 16:25:39,763 DEBUG: Full URL:http://app01:15672/api/overview 2019-02-05 16:25:39,811 DEBUG: Started trying to process data 2019-02-05 16:25:39,812 DEBUG: Issue a rabbit API call to get data on overview against app01 2019-02-05 16:25:39,812 DEBUG: Full URL:http://app01:15672/api/overview 2019-02-05 16:25:39,876 DEBUG: Started trying to process data 2019-02-05 16:25:39,877 DEBUG: Issue a rabbit API call to get data on overview against app01 2019-02-05 16:25:39,877 DEBUG: Full URL:http://app01:15672/api/overview 2019-02-05 16:25:39,927 DEBUG: Started trying to process data 2019-02-05 16:25:39,928 DEBUG: Issue a rabbit API call to get data on nodes against app01 2019-02-05 16:25:39,928 DEBUG: Full URL:http://app01:15672/api/nodes 2019-02-05 16:25:39,934 DEBUG: Checking to see if node name app01 is in rabbit@app01 for item partitions found 1 nodes 2019-02-05 16:25:39,935 DEBUG: Got data from node app01 of [] 2019-02-05 16:25:39,974 DEBUG: Started trying to process data 2019-02-05 16:25:39,975 DEBUG: Issue a rabbit API call to get data on shovels against app01 2019-02-05 16:25:39,975 DEBUG: Full URL:http://app01:15672/api/shovels 2019-02-05 16:25:47,160 DEBUG: Started trying to process data 2019-02-05 16:25:47,161 DEBUG: Issue a rabbit API call to get data on aliveness-test/%2f against app01 2019-02-05 16:25:47,161 DEBUG: Full URL:http://app01:15672/api/aliveness-test/%2f 2019-02-05 16:26:09,026 DEBUG: Started trying to process data 2019-02-05 16:26:09,027 DEBUG: Issue a rabbit API call to get data on queues against app01 2019-02-05 16:26:09,027 DEBUG: Full URL:http://app01:15672/api/queues 2019-02-05 16:26:09,032 DEBUG: Filtering out by [{}] 2019-02-05 16:26:09,032 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_memory,aliveness-test]" 8960 2019-02-05 16:26:09,033 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_messages,aliveness-test]" 0 2019-02-05 16:26:09,033 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_messages_unacknowledged,aliveness-test]" 0 2019-02-05 16:26:09,033 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_consumers,aliveness-test]" 0 2019-02-05 16:26:09,033 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_message_stats_deliver_get,aliveness-test]" 357 2019-02-05 16:26:09,033 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_message_stats_publish,aliveness-test]" 357 2019-02-05 16:26:09,033 DEBUG: SENDER_DATA: - "rabbitmq.queues[/,queue_message_stats_ack,aliveness-test]" 0 2019-02-05 16:26:09,042 DEBUG: Finished sending data 2019-02-05 16:26:09,042 INFO: Found return code of 2 2019-02-05 16:26:09,042 DEBUG: zabbix_sender [214448]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 7; total: 7; seconds spent: 0.000102"}]

2019-02-05 16:26:09,043 DEBUG: info from server: "processed: 0; failed: 7; total: 7; seconds spent: 0.000102" sent: 7; skipped: 0; total: 7

Thanks in advance.