Closed alculquicondor closed 3 years ago
cc @Jeffwan @gaocegege @terrytangyuan @andreyvelich @johnugeorge
I think that's a good idea. The good thing here is label should be transparent to end users. The challenge is to have a smooth migration plan for ongoing legacy jobs during the upgrade.
Sounds good as long as we have a deprecation plan for the following releases.
Is there any cadence for the releases?
Let me expand on the idea for the implementation:
(3) could happen 1 year after (2).
It's a great idea @alculquicondor!
Do we need to add training at the domain also ? I saw some Kubernetes projects are doing that.
For example: training.kubeflow.org/job-role
.
Thanks.
Yes, we do that to be more specific, as k8s is a big project. We could play safe and add training
too. Are you aware of other Kubeflow WGs having labels that might collide?
@alculquicondor I was thinking that we should also modify labels for Katib components: https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/components/controller/controller.yaml#L6-L8.
For example, katib.kubeflow.org/app: controller
would it make sense for that to be kubeflow.org/operator-name: katib
?
would it make sense for that to be
kubeflow.org/operator-name: katib
?
Since Katib has several components, we are using these labels to show cluster administrators what is this component for. For example, this label for Controller Deployment, this label for UI Deployment
Gotcha. Having training.kubeflow.org
makes a lot of sense then.
SGTM, and we also need to update the doc.
Pods created with kubeflow controllers get some labels https://github.com/kubeflow/common/blob/master/pkg/apis/common/v1/constants.go
It is common practice in k8s to add a domain to the label name. This should reduce the chances of collision with user-defined labels. Then, the labels should go from:
replica-index
tokubeflow.org/replica-index
replica-type
tokubeflow.org/replica-type
replica-group
tokubeflow.org/replica-group
(although we could take the chance to rename this tokubeflow.org/operator-name
with possible valuestf-operator
,mpi-operator
and so on)job-name
tokubeflow.org/job-name
job-role
tokubeflow.org/job-role
(how does this differ fromreplica-type
?)This should enhance visibility and make it easier for cluster administrators to track the usage of operators in the cluster.
The way we can implement this is to start adding both existent and new labels to newly created pods, and change the have something like this in the
constants.go
file:Then, we would remove the Deprecated variables after a few releases