docs: proposal for unified labeling strategy

bartoszmajsak commented 8 months ago

This document is intended to kick-off an initiative to standardize labeling strategy in Open Data Hub.

This is RFC - request for comments at this point, as there is a lot of unknown unknows for me here :)

This is more a work-in-progress doc than an actual decision record, so I am keeping it in DRAFT

[!NOTE] I see a need for similar activity around annotations, but I wanted to keep it separate.

strangiato commented 8 months ago

To start out, it is important to start with the motivation of why labels matter and when they are useful:

Labels can be used to search for all objects (including ones of different kinds) related to a specific function (e.g. oc get all -l "app.kubernetes.io/name=my-app")
Labels can be useful for telling the users specific metadata about an application such as the version (e.g. app.kubernetes.io/version: 1.2.3)
Labels can be helpful for informing users the lineage of a specific object (e.g. I need to know that pod-123 is managed by the notebook object my-notebook). This one may be a bit less useful, especially if you are standardizing on using ownerReferences which we should be part of our recommendations/standards IMO.
Labels can be helpful for application components to filter and select specific things (e.g. I need a network policy that can automatically select all notebook pods)

The next consideration after why you might label something is what kinds of things you want to label. Within the ODH ecosystem there are a few different types of objects that jump out to me:

Objects installed and managed by the ODH Operator/DSC
Objects automatically created by a specific component when that component is installed that users can modify (e.g. dashboardConfigs)
Objects that end users create/manages through either the dashboard or directly via YAML
Child objects created by individual components custom resources

This group has the ability to dictate what labels are added to all of the objects mentioned above, except for the objects that users create.

Any labels on objects that users create (including those that are created through the dashboard) should be a recommendation and not a requirement. If a label is required on a user managed object, that label should be automatically added to the object as part of a webhook.

Labels such as the following:

opendatahub.io/created-by: dashboard

Make sense for objects such as a dashboardConfig CR that the dashboard has automatically created when it was installed.

However, opendatahub.io/created-by: dashboard should not be required on a user managed object such as a Project, PVC, Notebook, DSPA, etc. We can default to setting this label on objects created by the Dashboard, but that label should not be required to interact with that object on the dashboard. I know that the Dashboard does use labels to filter what objects appear in the Dashboard today, but I know they have mentioned they want to re-evaluate some of those requirements.

I think that the manifests deployed by the DSC is one of the easiest areas to start with. I see a couple of labels that I think would be useful for every component. Below is an example of some labels that I would expect to be

opendatahub.io/ds-component: dataSciencePipelines <1>
opendatahub.io/version: 1.6.1 <2>
opendatahub.io/name: default <3>
opendatahub.io/managed-by: odh-operator <4>

The name of the component the object is related to in the DSC. I think it is important here to not use "component" as that has a commonly understood meaning in labels in k8s that we will likely want to use in other places.
The version of the component that was deployed. This can be helpful for an admin to understand what version of the component is currently deployed, which is not obvious from an image SHA.
This name corresponds to the name of the object that owns this object. In this case, a DSC named "default"
The operator that owns the custom resource that deploys that component. In this example, we see that this deployment is managed by the odh-operator.

Another example use case that is fairly easy to standardize would be the objects owned by individual components CRs. For example, on a database created by a DSPA, I might expect to see the following:

opendatahub.io/component: database <1>
opendatahub.io/version: 1.6.1 <2>
opendatahub.io/name: my-ds-pipeline <3>
opendatahub.io/managed-by: data-science-pipelines-operator <4>

In this case we are using the commonly understood meaning of "component" which indicates what the function of a specific piece is related to, in this case a database.
Like before, we want to know what version of the component is managing this resource which may be important when troubleshooting an issue with that component.
The name of the CR that created this. In this case, it is created from a dspa called "my-ds-pipeline".
Finally, we have the resource that manages that object, just like before.

Some additional resources to take a look at....

Upstream k8s has a list of common labels they recommend on applications here: https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/

Additionally, there is a list of "well known" labels here: https://kubernetes.io/docs/reference/labels-annotations-taints/#app-kubernetes-io-managed-by

A great blog on some label best practices and more info on how users use labels: https://blog.kubecost.com/blog/kubernetes-labels/

bartoszmajsak commented 8 months ago

Thanks for all the thoughts you shared @strangiato. I wouldn't put it better, but it's exactly what I have in mind with this ADR. I will expand on the ideas based on your feedback and provide more concrete examples in the doc.

One additional aspect to consider here is that with standardized labeling we can define the topology of RHOAI and so make it easier for the users to define e.g. traffic control rules.

zdtsw commented 3 months ago

create https://issues.redhat.com/browse/RHOAIENG-8386 to capture this PR

github-actions[bot] commented 2 months ago

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 months ago

This PR was closed because it has been stale for 21+7 days with no activity.

opendatahub-io / architecture-decision-records

docs: proposal for unified labeling strategy #26