hawkular / hawkular-openshift-agent

A Hawkular feed that collects metrics from Prometheus and/or Jolokia endpoints deployed in one or more pods within an OpenShift node.
16 stars 21 forks source link

Issue 105 prometheus metrics per label #117

Closed jmazzitelli closed 7 years ago

jmazzitelli commented 7 years ago

to implement issue #105 - still needs more testing. at first glance, it looks to be working.

jmazzitelli commented 7 years ago

this all looks good now to me. I took off the "do not merge" label. Barring objections, this will be merged soon.

jmazzitelli commented 7 years ago

I also think this new feature is a good one and warrants a new hosa release. I'll change the version # in this PR.

jmazzitelli commented 7 years ago

To explain this PR, I am including the text of an email I posted to the hawkular-dev ML:


The past several days I've been working on an enhancement to HOSA that came in from the community (in fact, I would consider it a bug). I'm about ready to merge the PR for this and do a HOSA 1.1.0.Final release. I wanted to post this to announce it and see if there is any feedback, too.

Today, HOSA collects metrics from any Prometheus endpoint which you declare - example:

   metrics
   - name: go_memstats_sys_bytes
   - name: process_max_fds
   - name: process_open_fds

But if a Prometheus metric has labels, Prometheus itself considers each metric with a unique combination of labels as an individual time series metric. This is different than how Hawkular Metric works - each Hawkular Metric metric ID (even if its metric definition or its datapoints have tags) is a single time series metric. We need to account for this difference. For example, if our agent is configured with:

   metrics:
   - name: jvm_memory_pool_bytes_committed

And the Prometheus endpoint emits that metric with a label called "pool" like this:

   jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.7787264E7
   jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 2.3068672E7

then to Prometheus this is actually 2 time series metrics (the number of bytes committed per pool type), not 1. Even though the metric name is the same (what Prometheus calls a "metric family name"), there are two unique combinations of labels - one with "Code Cache" and one with "PS Eden Space" - so they are 2 distinct time series metric data.

Today, the agent only creates a single Hawkular-Metric in this case, with each datapoint tagged with those Prometheus labels on the appropriate data point. But we don't want to aggregate them like that since we lose the granularity that the Prometheus endpoint gives us (that is, the number of bytes committed in each pool type). I will say I think we might be able to get that granularity back through datapoint tag queries in Hawkular-Metrics but I don't know how well (if at all) that is supported and how efficient such queries would be even if supported, and how efficient storage of these metrics would be if we tag every data point with these labels (not sure if that is the general purpose of tags in H-Metrics). But, regardless, the fact that these really are different time series metrics should (IMO) be represented as different time series metrics (via metric definitions/metric IDs) in Hawkular-Metrics.

To support labeled Prometheus endpoint data like this, the agent needs to split this one named metric into N Hawkular-Metrics metrics (where N is the number of unique label combinations for that named metric). So even though the agent is configured with the one metric "jvm_memory_pool_bytes_committed" we need to actually create two Hawkular-Metric metric definitions (with two different and unique metric IDs obviously).

The PR that is ready to go does this. By default it will create multiple metric definitions/metric IDs in the form "metric-family-name{labelName1=labelValue1,labelName2=labelValue2,...}" unless you want a different form in which case you can define an "id" and put in "${labelName}" in the ID you declare (such as "${oneLabelName}_my_own_metricname${theOtherLabelName}" or whatever). But I suspect the default format will be what most people want and thus nothing needs to be done. In the above example, two metric definitions with the following IDs are created:

  1. jvm_memory_pool_bytes_committed{pool=Code Cache}
  2. jvm_memory_pool_bytes_committed{pool=PS Eden Space}