hawkular / hawkular-openshift-agent

A Hawkular feed that collects metrics from Prometheus and/or Jolokia endpoints deployed in one or more pods within an OpenShift node.
16 stars 21 forks source link

Jolokia Metric Name #57

Open mwringe opened 7 years ago

mwringe commented 7 years ago

Right now we are setting the name of the jolokia metric to the mbean and its path where the metric can be found (https://github.com/hawkular/hawkular-openshift-agent/blob/master/hack/jolokia-wildfly-example/jolokia-wildfly-configmap.yaml#L14). I don't believe naming this name is accurate here.

We may want to rename this to make it more clear what it is.

Reserving name for a human readable value may also make sense, but we are also not doing this for prometheus endpoints either. We may want to come up with a consistent naming scheme between different endpoint types.

jmazzitelli commented 7 years ago

This is the "Name" field of the generic "MonitoredMetric" structure ( https://github.com/hawkular/hawkular-openshift-agent/blob/master/collector/metrics_endpoint.go#L36-L48 ) - which is used in the generic "Endpoint" structure ( https://github.com/hawkular/hawkular-openshift-agent/blob/master/collector/metrics_endpoint.go#L50-L64 ) - the Endpoint structure having shared usage across the different endpoint types (today there are two: Jolokia, and Prometheus, with presumably more types possible in the future). Sharing a single MonitoredMetric YAML structure in a single shared Endpoint structure makes the YAML consistent across the different endpoint types (the metric name is always stored in the name field - but the format/syntax of that metric name is dependent on the type of endpoint; e.g. its an MBean object name for jolokia endpoints, a metric name in the form "some_name" for Prometheus).

So the idea is this is the "metric name" as known to the endpoint.

The metric ID (that which is stored in H-Metrics to identify this metric's data) is by default this "name" unless you tell the agent what you want its ID to be - that is what the optional ID field is for: https://github.com/hawkular/hawkular-openshift-agent/blob/master/collector/metrics_endpoint.go#L44

So the problem to solve if we are to change "name" to something else (say "object-name" for jolokia endpoints and "metricName" for prometheus endpoints) is how to share this MonitoredMetric structure across all endpoint types but yet have the field names be different depending on the endpoint type. I don't know how to do this in Go. Unless we take out the generic nature of these YAML structures and just create different YAML structures per endpoint type - so instead of:

endpoints:
 - type: prometheus
   protocol: "http"
   metrics:
   - name: process_virtual_memory_bytes
   ...
 - type: jolokia
   protocol: "http"
   metrics:
   - name: java.system:type=Runtime
   ...

We would do something like:

endpoints:
 - type: prometheus
   protocol: "http"
   prometheus-metrics:
   - metric-name: process_virtual_memory_bytes
   ...
 - type: jolokia
   protocol: "http"
   jolokia-metrics:
   - object-name: java.system:type=Runtime
   ...

I don't know. That seems kind of odd to have "type: jolokia" but then have a hard-coded "jolokia-metrics" field name.

Maybe split endpoints to be:

prometheus-endpoints:
   - protocol: http
     metrics:
     - metric-name: process_virtual_memory_bytes
...
jolokia-endpoints:
   - protocol: http
     metrics:
     - object-name: java.system:type=Runtime
...

Either way this is a lot of work (and will make the code more complicated) for little value.

Another alternative is to figure out a better name to use than "name" that would be applicable across all endpoint types so all we would have to do is change the "Name" field in MonitoredMetric to something else more appropriate. Perhaps "metric-name" is good enough and meaningful for both Prometheus and Jolokia. Or perhaps "metric-identifier" (but then that sort of confuses it with the "id" field).

mwringe commented 7 years ago

The main use case for this is that we want a human readable name for the metric (and most likely also things like a description, and potentially others). name is the more logical value to be used here, while using name for the path to the mbean isn't intuitive. I believe it does for prometheus as that follows the conventions used there.

This is why I think we need to discuss this.

We may also want to rethink how we are collecting metrics. I could see the use case where we would want to monitor all the metrics for a particular mbean. As well as the use case where we would want to monitor all the metrics at a specific prometheus endpoint. Both of these require changes to how the metric definition is defined.

pilhuhn commented 7 years ago

In RHQ we had the 'displayName' for the human readable part in case that 'name' was non-human-readable.

If we have an explicit type: jolokia and type: prometheus, then why not just use an attribute of metric-name for both that points to an object name for Jolokia and the p.-metric-name otherwise? I guess especially for the Jolokia case we need to have the possibility to explicitly supply units.

jmazzitelli commented 7 years ago

If we have an explicit type: jolokia and type: prometheus, then why not just use an attribute of metric-name for both

That's what we have today - except in the YAML it is "name" not "metric-name". I think to avoid confusion, we just rename that "metric-name" - least amount of change required for that.

As for a human-readable name for the metric - that is what "id" is for. In H-Metrics there isn't a "human readable name" field AFAIK - H-Metrics only has "metric-id" to identify a metric. This is why the agent's YAML has "id" - it is the metric ID. If you do not specify a metric ID ("id") in the YAML, the agent will fallback and use the metric "name". So for Jolokia, as example,

- name: java.system:type=Runtime#HeapMemoryUsage#used
  id: Used Memory

The agent will store the metric definition with a metric id of "Used Memory" - that is what H-Metrics will call this metric. If you do not specifyt that "id: Used Memory", the metric definition stored in H-Metrics will have a metric id of "java.system:type=Runtime#HeapMemoryUsage#used"

ghost commented 7 years ago

I am trying to set up jolokia custom metrics to be display in metrics tab for the pod in OCP console. I see that it is using tags for getting the list, but data is based on metrics_name tag, so if we use the default (that is all this mbean thing), the console does not display any data.

The workarund I had to put in place is to delete the tag from the agent and put a custom tag (with the id) called "metrics_name" in each metric.

Is there any more elegant solution?

jmazzitelli commented 7 years ago

@gordillo-ramon-redhat - @mwringe would know more details on the console. I recommend we take this discussion to the hawkular-dev mailing list rather than take this git issue off-topic.

But to respond:

The way I thought it worked was that the console uses the "description" tag to label the graph in the console (its the big text you see above each metric graphs). The console should be using the metric ID to query the H-Metrics because that's the unique thing that identifies the specific time series (the name is not unique because I could have multiple pods each with an MBean of the same name being monitored).

That said, I'm seeing some issues with the console myself.

Running with the Promtheus example, I see empty graphs for metrics with curly braces in the IDs (e.g. "pod/5afce753-085b-11e7-b5e1-54ee7549ae45/custom/test_meals_eaten_total{drink=Water,food=Banana}") but for other Prometheus metrics without those special characters, the graphs show up fine.

Running with the Jolokia example, at first I don't see any graph at all for its custom metric (whose ID is "pod/b8919778-085e-11e7-b5e1-54ee7549ae45/custom/java.lang:type=Memory#HeapMemoryUsage#used"). But if I let the system run for 5 minutes after I deploy the pod, I see the graph with data.

So either: a) you didn't let the system run long enough for the metrics to show up (I have no idea why it takes 5 minutes for the graph to appear and show data - I confirmed there is data in H-Metrics within the first minute of the pod being deployed)

or

b) there is a special character in your metric ID that is causing a problem (perhaps a curly brace??).

I think there is something about special characters that foobar's the graphs - like I say, my Prometheus metrics with curly braces in the IDs causes no data to show (you can see this by deploying the prometheus example). In fact, I'm going to write a git issue for this problem.

But other than that, the console is working for me.

jmazzitelli commented 7 years ago

(FYI: I see the problem with the Prometheus stuff - and it looks like I sensed this problem was coming because I already wrote an issue that, when implemented, would fix this problem. See issue #130 - has nothing to do with "special characters" - but instead has to do with the value of the tag "metric_name" that the agent creates when dealing with Prometheus time series metrics that have labels.)

jmazzitelli commented 7 years ago

@gordillo-ramon-redhat I think I understand what your issue is now. And I think I fixed it here: https://github.com/openshift/origin-web-console/pull/1344 (note: as you can see, this fix is not in HOSA itself but in the OpenShift web console).

This also fixes the problem I saw with Prometheus metrics.

ghost commented 7 years ago

Seems it is the same issue. Hope it will be included in next origin release (1.5.0).