twosigma / Cook

Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Apache License 2.0
338 stars 63 forks source link

Provide fine grained preemption and dru tuning #218

Open wyegelwel opened 7 years ago

wyegelwel commented 7 years ago

Currently Cook has a concept of a share (per user) which encapsulates two things:

  1. The limit under which resources will not be preempted
  2. The weight of users when calculating dru (effectively usage/share)

This creates a problem in the case that an operator would like to raise the non-preemptable share but keep the weight the same.

Therefore, to separate these two concepts we should add two parameters:

  1. non-preemptable share
  2. dru weight

It is probably ok to repurpose share to be only non-preemptable share and then add just dru weight.

icexelloss commented 7 years ago

Wil,

We already have :rebalancer.config/safe-dru-threshold, how about allow overriding this for individual users?

wyegelwel commented 7 years ago

You make a good point, this is missing a finally component. We would need a (global?) divisor for usage that is not share. I was going to make another issue about it, but this divisor should be dynamic with the dominant resource of the cluster so that if a user wants to use a lot of a resource that is non-dominant, they can get a large absolute share of the cluster.

Given the divisor, your suggestion and this would be equivalent. In your proposal, share would effectively be weight and safe-dru-threshold would effectively be non-preemptable share.

The problem with this is that it is un-intuitive; to know what your non-preemptable share is you need to multiply safe-dru-threshold by share. Further, your weight has multiple parameters, one for each resource. This is quite confusing especially given that the dominant resource (or how dominant the resource is) of the cluster can change.

icexelloss commented 7 years ago

"global divisor should be dynamic with the dominant resource of the cluster"

If you want to do that, keep in mind that we likely to make the score "stable". By that I mean when the dominant resource changes in the cluster, user score shouldn't change dramatically. In the old system there was an issue when the dominant resource of the cluster change, the score changes too much, resulting in chaotic preemption behavior.

wyegelwel commented 7 years ago

That is a great point. This would have to be some sort of moving average that moved slowly.

On Thu, Dec 29, 2016, 4:58 PM Li Jin notifications@github.com wrote:

"global divisor should be dynamic with the dominant resource of the cluster"

If you want to do that, keep in mind that we likely to make the score "stable". By that I mean when the dominant resource changes in the cluster, user score shouldn't change dramatically. In the old system there was an issue when the dominant resource of the cluster change, the score changes too much, resulting in chaotic preemption behavior.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/twosigma/Cook/issues/218#issuecomment-269700015, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZnHelcnKB4QHD7ZdiTv58N5C-fi8OIks5rNCz0gaJpZM4LX0tQ .