Open jsmsng opened 7 years ago
They should not be measurements themselves. I am not super familiar with the collectd -> influxDB setup.
Are you sure they are coming from this plugin ?
I think all the dispatched metric have rabbtimq_
prefixed on the name.
What should happen is this. When a dispatch event is triggered by collectd, it should get the nodes information (as well as the other stuff like queues and exchanges). The information should be dispatched to collected through their naming schema. This is kind of a pain. I am thinking that this perhaps something is not getting setup right and it is just dispatching to sjc-rabbitmq-p3
with no data.
Could you post your collectd config ?
Also, are sjc-rabbtimq-p[1,2,3]
hostnames ?
I am sure they are coming from the plugin because they were not there before the plugin was installed.
sjc-rabbtimq-p[1,2,3] are hostnames.
Config is below:
Hostname "sjc-rabbitmq-p1" FQDNLookup true BaseDir "/var/lib/collectd" PluginsDir "/usr/lib/collectd" TypesDB "/usr/share/collectd/types.db" Interval 60 Include "/etc/collectd/modules/*.conf"
TypesDB "/usr//share/collectd/types.db.custom"
LoadPlugin python
Here is an example when I select a measurement such as "cpu_value" I get the hostnames inside the measurement:
so, all plugins dispatch values with a set structure. I think in collectd it is value_list_t from here
So all values must have the structure of host plugin plugin_instnace type type_instance
So for rabbitmq, I do something like
cluster exchanges
For node stats I replace the cluster part with the actual node name.
Where does the cpu_value come from?
I will continue this thread instead of my colleague jsmsng :) cpuvalue comes from another servers and metrics. One more thing, when I choose "host" value from rabbitmq-p1 (which shouldnt be rabbitmq-p1_ but eg. run_queue) measurement, I see "rabbitmq_default" (please see attached pic) which should be rabbitmq-p1.
I am not an influx person, but i could probably set it up. I don't know what that image is trying to point out. You stated that host shouldn't be "rabbitmq-p1_" but should be run_queue. What is "host" in this context?
rabbitmqdefault is the vhost host = "/" but you can't have "/" as a name in collectd. vhosts in rabbitmq appear at the cluster level. I was slightly incorrect in my last post about the way I map rabbit data into collectd metric path.
it should be
[cluster]vhost exchanges
Let's remove influxdb from the equation if possible.
Collectd sends stuff to carbon which writes to whisper files. The vagrant box stores whisper files in
/media/metrics/whisper
The vagrant box sets up to vhosts ( vhost1, vhost2) + the default vhost "/"
So, in the /media/metrics/whisper/collectd
I see this:
ubuntu@vagrant:/media/metrics/whisper$ ls collectd/
rabbitmq_default rabbitmq_rabbit@vagrant rabbitmq_vhost1 rabbitmq_vhost2
Now normally, with other plugins each of those might map to an actual host. In this case, they map to vhosts and clustsers. rabbitmq_default == / rabbitmq_vhost1 == vhost1 rabbitmq_vhost2 == vhost2 and rabbit@vagrant == the cluster
so the stats under them are a bit different. The vhosts should have queues and exchanges, the cluster should have the overview stats. Does any of that help ?
I've found something similar also and hope I can illustrate it slightly differently via the syslog / debug output.
Raw syslog capture showing a RabbitMQ metric of each type and an entry from the swap (last line) plugin as a reference:
May 8 02:13:36 rabbitmq-085fae55d3d1ccf0c collectd[11434]: plugin_dispatch_values: time = 1494209616.488; interval = 10.000; host = rabbitmq_default; plugin = rabbitmq-085fae55d3d1ccf0c.environment.company.com; plugin_instance = ; type = sockets_used_details; type_instance = sample;
May 8 02:13:36 rabbitmq-085fae55d3d1ccf0c collectd[11434]: plugin_dispatch_values: time = 1494209616.522; interval = 10.000; host = rabbitmq_rabbit@rabbitmq-085fae55d3d1ccf0c.environment.company.com; plugin = overview; plugin_instance = message_stats; type = publish; type_instance = ;
May 8 02:13:37 rabbitmq-085fae55d3d1ccf0c collectd[11434]: plugin_dispatch_values: time = 1494209617.223; interval = 10.000; host = rabbitmq_default; plugin = queues; plugin_instance = gen2.channel.deadLetter; type = consumers; type_instance = ;
May 8 02:19:17 rabbitmq-085fae55d3d1ccf0c collectd[12322]: plugin_dispatch_values: time = 1494209957.359; interval = 10.000; host = rabbitmq-085fae55d3d1ccf0c.environment.company.com; plugin = swap; plugin_instance = ; type = swap_io; type_instance = out;
Broken down into main plugin fields (stripped irrelevant data and formatted for better comparison) its clear to see whats getting populated into each field:
host = rabbitmq_default; plugin = rabbitmq-085fae55d3d1ccf0c.environment.company.com; plugin_instance = ; type = sockets_used_details; type_instance = sample;
host = rabbitmq_rabbit@rabbitmq-085fae55d3d1ccf0c.environment.company.com; plugin = overview; plugin_instance = message_stats; type = publish; type_instance = ;
host = rabbitmq_default; plugin = queues; plugin_instance = gen2.channel.deadLetter; type = consumers; type_instance = ;
host = rabbitmq-085fae55d3d1ccf0c.environment.company.com; plugin = swap; plugin_instance = ; type = swap_io; type_instance = out;
This has caused metrics to be stored in varying locations on disk, none of which line up with our standard structure (everything stored under environment, hostname then down into metrics). Looks like a bug to me but its possible I'm missing something also with the designed structure !?
This is using RabbitMQ 3.6.9 and collectd 5.7.1
I based this plugin on what collectd provides. this plugin does not have anything to do with the way collectd (and where ever collectd sends the data to) stores things on disk.
(also, that last one is not from this plugin. There is no type swap_io
. That is most likely from https://collectd.org/wiki/index.php/Plugin:Swap)
The architectural choices I made are:
How do you expect the metrics to be dispatched ?
@jimbydamonk I have unfortunately also found that somehow the metrics being sent by the collectd-rabbitmq plugin differ from the other plugins on the same host. Thank you for the explanation on the vhosts though, that totally makes sense to name the '/' vhost _default.
In my scenario, we write the metrics to graphite and write_graphite is configured with a Prefix as follows (environment and source of the data):
Prefix "DEV.collectd."
So for my rabbitmq host, the CPU and memory metric names end up in graphite with the following names:
DEV.collectd.dev-rabbitmq-01.cpu-0.cpu-idle
DEV.collectd.dev-rabbitmq-01.memory.memory-free
...
The write_graphite plugin adds the hostname to the Prefix.
While I was looking for my rabbitmq metrics however, I found that those metrics do not show up with the hostname, but just with the prefix like this:
DEV.collectd.rabbitmq_default.exchanges-amq_topic.ack
DEV.collectd.rabbitmq_default.exchanges-amq_topic.confirm
DEV.collectd.rabbitmq_default.exchanges-amq_topic.deliver
...
DEV.collectd.rabbitmq_dev-rabbitmq.overview-message_stats.ack
DEV.collectd.rabbitmq_dev-rabbitmq.overview-message_stats.confirm
...
Any idea why the hostname is missing in there?
Thanks a lot!
@jimbydamonk I think I know what the problem is there...looking again at @david-morton's comment it seems you are overwriting the "host" value in the metric with the "vhost" or "cluster". Just like you said "Vhosts are treated as hosts.". Any chance you could fix that please?
I must still be missing something. Many of the metrics are at the vhost host level of a cluster. So let's say you have host_a, host_b, and host_c. They are all running the same rabbitmq, collectd-rabbitmq plugin and collectd. They are clustered together.
The number of acks on a topic exchange for instances, isn't a host based value. It is at the cluster level for that vhost. If it were to change to the actual host (from the vhost) it could have partial or incorrect data. It also doesn't make sense. CPU is a host metric. Disk Usage is a host metric. Acks on an exchange are different. They are not on a host. They are at a vhost level. Node stats are on host metrics.
Collectd doesn't give much room to change the general path, as I have stated before.
With the idea of moving those from a vhost to a host where would you locate the vhost name? if I have two exchanges, they could have the same name, but be in different vhosts. How could we handle that?
I see what you mean with metrics being part of a cluster. However, I think they should still be stored under the host they were collected from. Then when they are being used (in grafana for example), they can either be aggregated or averaged or somehow else be brought together from the different hosts within one cluster. I do believe, the host value in collectd plugins was never supposed to be updated and the plugin should always be set to the same value when coming from the same plugin: https://collectd.org/wiki/index.php/Naming_schema
I've looked at several other python collectd plugins and none of them even set the host and all of them set the plugin to what their respective name was.
So having said that, I would expect the metrics collected here, to look something like follows: dispatch_overview
self.dispatch_values(values=value,
host='',
plugin=plugin_name,
plugin_instance='overview-' + cluster_name,
metric_type=type_name,
type_instance=stat_name)
dispatch_queue_stats
self.dispatch_values(values=value,
host='',
plugin=plugin_name,
plugin_instance='vhost-' + vhost + '-' + plugin + '-' + plugin_instance,
metric_type=name,
type_instance=None)
I did set plugin_name
to rabbitmq
in the very beginning of the module.
So in the case above, my queue now shows up in the graphite metrics like this:
collectd/HOSTNAME/rabbitmq-vhost-rabbitmq_default-queues-hello/ack.wsp
And my cluster overview stats look like this:
collectd/HOSTNAME/rabbitmq-overview-rabbit@HOSTNAME/ack.wsp
The default vhost should may be be renamed to just "default" and the cluster would probably better not include an @ sign.
I hope I am making sense. I am trying to update all of the code to fit this so I could submit a diff if you like. Then you could test it and see how it works.
I think https://github.com/NYTimes/collectd-rabbitmq/issues/42 is related to this btw.
That is a pretty significant, breaking change. I am not against it but the implication that its a quick code change doesn't include all of the users that use the plugin today.
It puts the vhost at a different fundamental level that it is today. It also means that users are going to have to deal with duplicates information for each node in a cluster. It also means that overview stats are stored with each node... which kinda doesn't make sense.
I am not seeing any documentation on overriding hostname in a collectd plugin but I am no expert at it.
This change would make the code seem cleaner. If you could, open a PR and I can take a look. Thanks!
It is a significant change, yes. Thank you for being open to look at the code. I have created a PR for you to look at: https://github.com/NYTimes/collectd-rabbitmq/pull/59
Hi
we use Collectd/Graphite the same way as @anitakrueger and @khairullinr and others.
We override Hostname variable in collectd agent to this value :
Hostname "CUSTOMER_NAME.CUSTOMER_APP.CUSTOMER_ENV.INSTANCE_NAME"
For example, an MySQL instance in production, for the intranet of Acme company will be : Hostname "acme.intranet.prod.mysql_1"
.
I don't known if PR #59 can work for us, but currently, we can't use your collectd plugin and it's quite a bad situation since others plugins are buggy or instable.
We would like to tell you that your work is greatly appreciated and we would prefer to use your version instead of a new fork with #59 merged. Maybe can you add a new parameter option, and if it's set, the plugin uses #59 method. Otherwise, the plugin uses current method.
What do you think ?
Thanks !
Also faced with the fact that the plugin generates metrics not as expected. I read this issue and I think, that it's very important: everyone usually expects metrics like that - stats.collectd.SOME_HOST.rabbitmq.
, for collectd config:
/etc/collectd/plugins/write_graphite.conf
LoadPlugin write_graphite
<Plugin write_graphite>
<Carbon>
...
Prefix "stats.collectd."
The architectural choices I made are:
- Prefix all metrics with rabbit.
- Vhosts are treated as hosts.
- Cluster are treated as hosts as well
- Node details are under their vhost
If architecture requires that (location rabbitmq metrics in stats.collectd.
with Vhosts are treated as hosts) - can it be taken out to a README? I have been trying to understand for a long time why the metrics are in an unusual place, until I found this issue
Are the individual hostnames in a cluster supposed to be logged as measurements themselves, unlike say "proceses_value" which records hostnames inside that measurement?
In my InfluxDB database, I see the hostnames logged as measurements, is this normal?
See attached image for reference.