kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.79k stars 1.38k forks source link

Counters initialization for metrics exposure #909

Open Jimmy-Newtron opened 4 years ago

Jimmy-Newtron commented 4 years ago

As a devops installing this operator, I have put in place prometheus metrics collection.

While having a look into the metrics I discovered that counters are available only once a SparkApplication is taking a specific status (RUNNING, FAILED, SUBMITTED, ...)

Actually I wanted to compute the failure ratio as FAILURES / (SUCCESS + FAILURES) Unfortunately this ratio has NaN result due to the missing FAILURES metric (that I expected initialized to 0)

As per prometheus client documentation: https://github.com/prometheus/client_golang/blob/master/prometheus/counter.go#L194 You can initialize counters to 0 by invoking GetMetricWithLabelValues(labels)

I have tried to create a PR where you can have a look and get inspiration: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/pull/903/files

Another thing I remarked is that the metric sparkAppFailedSubmissionCount is not registered and so not available in the exposed metrics.

Hope this ticket is clear enough. Have a nice day.

Cheers, Jimmy

liyinan926 commented 4 years ago

Please sign the Google CLA for your PR. I will take a look after you sign it. Thanks!

Jimmy-Newtron commented 4 years ago

Please sign the Google CLA for your PR. I will take a look after you sign it. Thanks!

CLA signed

github-actions[bot] commented 1 day ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.