nytimes / collectd-rabbitmq

A collected plugin, written in python, to collect statistics from RabbitMQ.
https://collectd-rabbitmq.readthedocs.org/
Other
145 stars 79 forks source link

Enable monitoring on rabbit? plugin error message #57

Open mmissire opened 7 years ago

mmissire commented 7 years ago

Hi,

I am seeing this error in /var/log/messages:

Apr 17 17:47:11 rabbit-ddev collectd[28571]: ValueError: invalid literal for float(): disk_free_monitoring_disabled Apr 17 17:47:11 rabbit-ddev collectd[28571]: read-function of plugin `python.collectd_rabbitmq.collectd_plugin' failed. Will suspend it for 20.000 seconds.

Does this mean the plugin was expecting a float, but Rabbit returned the textual error "disk_free_monitoring_disabled" instead?

There is a disk_free entry in types.db:

disk_free value:GAUGE:0:U

It sounds like there is something on the Rabbit side that needs to be enabled. Do you know what it is? Should "collect_statistics" be set to in the rabbit configuration file to "coarse"? The default is "none," but I am not sure if that is related.

Update: That made no difference. However, I've discovered the server also returns this error when the management api is checked via curl:

curl localhost:15672/api/nodes/{nodename}/ -u {user} | jq .

A line included in the response:

"disk_free_limit": "disk_free_monitoring_disabled"

jimbydamonk commented 7 years ago

Based on https://github.com/rabbitmq/rabbitmq-management-agent/blob/master/src/rabbit_mgmt_external_stats.erl#L160 it seems that rabbit is sending that string instead of the value.

Since disk_free is a gauge it must be a number. Not a string. We should probably handle that in the dispatch so the whole thing doesn't die.

It looks like something is not turned on from the Rabbit side. What version of rabbit are you using ? What OS/Platform are you running rabbitmq on ? I think the disk monitoring will only do that if it can't calculate disk space. Take a look at this. https://www.rabbitmq.com/disk-alarms.html

mmissire commented 7 years ago

Yes, you can see this with curl:

curl localhost:15672/api/nodes/<rabbit node>/ -u <rabbit user> | jq .

Look for the line: "disk_free_limit": "disk_free_monitoring_disabled",

I am using RabbitMQ 3.6.6 on CentOS 6.8. Following some advice from StackOverlow saying Rabbit made this decision at runtime based on the success or failure of "df -kP ," I tried that with various guesses as to what "directory" should be. Either I didn't get it right or the cause is something else, because the command worked fine.

Not finding a way of enabling disk free monitoring on this installation of Rabbit, I commented the "node stats" from the plugin as a workaround. I agree the only change needed on the plugin's part is probably anticipating and handling the string error response better (which is a general thing).

jimbydamonk commented 7 years ago

Can you try running the df on your mnesia dir ? sudo /bin/df -kP /var/lib/rabbitmq/mnesia/

and rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP /var/lib/rabbitmq/mnesia/").'