radanalyticsio / openshift-spark

72 stars 83 forks source link

Adding prometheus endpoints on port 7777 for worker and master #35

Closed zak-hassan closed 6 years ago

zak-hassan commented 6 years ago

Added agent to expose prometheus endpoint for spark master and workers when environment variable "SPARK_METRICS_ON" is enabled.

elmiko commented 6 years ago

i'm curious if there is an advantage to using agent-bond over the prometheus official jmx exporter?

it seems like the prometheus version has more recent activity: https://github.com/fabric8io/agent-bond https://github.com/prometheus/jmx_exporter

is there a detail i'm missing here?

zak-hassan commented 6 years ago

Its the same thing. It exposes all jmx beans to export the csv format that prometheus can scrap.

elmiko commented 6 years ago

should we be using the official prometheus exporter then since it appears to have more recent work? (it seems like agent-bond hasn't been touched in awhile)

zak-hassan commented 6 years ago

I could swap out the agent. Let me see.

elmiko commented 6 years ago

i'm just curious if we should be using one or the other, i don't know much about the upstream development. thanks for checking!

zak-hassan commented 6 years ago

I was thinking to go with agent-bond cause the maintainer is someone we can ping internally. Would it be good to send an email to ask him?

elmiko commented 6 years ago

i think it would be worthwhile to get in touch, if only to understand the state of the projects.

edit: also, i agree, having someone to ping is a great feature =)

zak-hassan commented 6 years ago

@rhuss What is the state of the agent-bond project. Can we use it for our project which will expose prometheus endpoint in openshift pod?

rhuss commented 6 years ago

@zmhassan agent-bond is pretty stable and used in our community base images at https://github.com/fabric8io-images/java and it works quite nicely. Its community supported but afair agent-bond did not made it to the product, where jolokia only is used.

rhuss commented 6 years ago

Could be that an update of the included agents in agent-bond might be needed, but thats should not be a big deal.

elmiko commented 6 years ago

@rhuss thanks for the responses, do you know if there is any difference between the agent-bond exporter and the prometheus exporter?

and is there any advantage to one over the other?

rhuss commented 6 years ago

Agent-bond includes jmx_exporter. Actually agent-bond's sole purpose is to have only a single agent instead of two when using both jolokia and jmx_exporter, simplifying the configuration. Its a simple delegate to the entry points to each of the agents, with a configuration which also dispatches to both agents.

Agent-bond could be easily extended to more agents.

So if you want tmore flexibility you should add both agents separately (so you can update them separately and cutting down deps). If you want a simplified setup, you can use agent-bond (but you would need to update agent-bond when you want to update either jolokia or jmx_exporter).

elmiko commented 6 years ago

ok, makes perfect sense to me now. thanks for the in-depth explanation @rhuss!

elmiko commented 6 years ago

this PR lgtm, i have not tested against a prometheus setup but it seems reasonable. thanks @zmhassan

zak-hassan commented 6 years ago

@elmiko I've tested this against prometheus and it works. If there is nothing else left then please merge. If you would like to give it a test run. You can use this template:

https://raw.githubusercontent.com/zmhassan/openshift-spark/0e06f634c62fc08443892dd378a67d8abab33628/spark-metrics-template.yaml

zak-hassan commented 6 years ago

@elmiko if there is nothing else left then lets merge this PR

tmckayus commented 6 years ago

lgtm too. I think the only issue here is to figure out how to tag and merge it. If we have folks already using jolokia-based metrics with the openshift-spark image, then merging this would break their application, no, because we swith from jolokia to prometheus and switch ports but by default it would still be tagged opensihft-spark:2.2-latest.

Our images have been identified primarily by spark version (2.2), we need to figure out how to label this.

zak-hassan commented 6 years ago

@tmckayus @elmiko I've changed to port to 7777 that way when you deploy metrics then we don't need to change the template.

tmckayus commented 6 years ago

@elmiko @willb @zmhassan ptal

willb and I discussed and decided the best course was to default but deprecate jolokia for some time, then ultimately carry only prometheus going forward. This should accomplish that (and the jar size is small).

I tested this with a locally built image and passed it to the template with -p SPARK_IMAGE, it worked in all cases. The only outstanding question is default image values in the template and in the build/push script.