kubeflow / common

Common APIs and libraries shared by other Kubeflow operator repositories.
Apache License 2.0
51 stars 73 forks source link

Use fully-qualified labels #148

Closed alculquicondor closed 3 years ago

alculquicondor commented 3 years ago

Pods created with kubeflow controllers get some labels https://github.com/kubeflow/common/blob/master/pkg/apis/common/v1/constants.go

It is common practice in k8s to add a domain to the label name. This should reduce the chances of collision with user-defined labels. Then, the labels should go from:

This should enhance visibility and make it easier for cluster administrators to track the usage of operators in the cluster.

The way we can implement this is to start adding both existent and new labels to newly created pods, and change the have something like this in the constants.go file:

const ReplicaIndexLabelDeprecated = "replica-index"
const ReplicaIndexLabel           = "kubeflow.org/replica-index"

Then, we would remove the Deprecated variables after a few releases

alculquicondor commented 3 years ago

cc @Jeffwan @gaocegege @terrytangyuan @andreyvelich @johnugeorge

Jeffwan commented 3 years ago

I think that's a good idea. The good thing here is label should be transparent to end users. The challenge is to have a smooth migration plan for ongoing legacy jobs during the upgrade.

terrytangyuan commented 3 years ago

Sounds good as long as we have a deprecation plan for the following releases.

alculquicondor commented 3 years ago

Is there any cadence for the releases?

Let me expand on the idea for the implementation:

  1. We change the common repository as suggested above
  2. When we upgrade the common dependency of an operator, we make sure the controller adds both the new and deprecated labels. We also make sure that any logic that relies on the labels supports both.
  3. In a later release, we stop adding the deprecated labels and clean up the logic to only rely on the new labels.

(3) could happen 1 year after (2).

andreyvelich commented 3 years ago

It's a great idea @alculquicondor!

Do we need to add training at the domain also ? I saw some Kubernetes projects are doing that. For example: training.kubeflow.org/job-role.

alculquicondor commented 3 years ago

Thanks. Yes, we do that to be more specific, as k8s is a big project. We could play safe and add training too. Are you aware of other Kubeflow WGs having labels that might collide?

andreyvelich commented 3 years ago

@alculquicondor I was thinking that we should also modify labels for Katib components: https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/components/controller/controller.yaml#L6-L8. For example, katib.kubeflow.org/app: controller

alculquicondor commented 3 years ago

would it make sense for that to be kubeflow.org/operator-name: katib?

andreyvelich commented 3 years ago

would it make sense for that to be kubeflow.org/operator-name: katib?

Since Katib has several components, we are using these labels to show cluster administrators what is this component for. For example, this label for Controller Deployment, this label for UI Deployment

alculquicondor commented 3 years ago

Gotcha. Having training.kubeflow.org makes a lot of sense then.

gaocegege commented 3 years ago

SGTM, and we also need to update the doc.