palantir / k8s-spark-scheduler

A Kubernetes Scheduler Extender to provide gang scheduling support for Spark on Kubernetes
Apache License 2.0
175 stars 43 forks source link

Report # executors / # nodes used to schedule this application #249

Closed Alexis-D closed 1 year ago

Alexis-D commented 1 year ago

Before this PR

We don't have a good way to quantify fragmentation. With this new metric we should be able to observe whether https://github.com/palantir/k8s-spark-scheduler-lib/pull/100 actually reduces fragmentation (this metric should go up in this case).

After this PR

==COMMIT_MSG== Report # executors / # nodes used to schedule this application ==COMMIT_MSG==

Possible downsides?

Histogram works with int64,not float64, either we get something a little imprecise e.g. 5 executors on 3 nodes = 5 / 3 = 1, or we need to scale the metric 100 * 5 / 3 = 166, but then this is a bit confusing as the metric doesn't report the true ratio anymore. Given that we mostly care about fragmentation of larger applications where this metric should ideally be well above 1, I went with the former approach.

svc-autorelease commented 1 year ago

Released 0.57.0