pachyderm / pachyderm

Data-Centric Pipelines and Data Versioning
https://www.pachyderm.com/
Apache License 2.0
6.19k stars 566 forks source link

Enable adding object metadata to pipeline pods #4598

Open kevindelgado opened 4 years ago

kevindelgado commented 4 years ago

Goal / Desired Outcome

Allow for setting annotations and labels on pipeline pods via top level metadata field on the pipeline spec.

Background

Currently a pachyderm user can set the pod_patch field on a pipeline spec and this will add arbitrary valid json to the user container of the pipeline pod. This is limited to the pipeline pod however, and users should be able to edit pod metadata that lies outside the users container (such as setting annotations and labels on the object metadata of the pod).

Addtionally, we have an annotations field on the service of the pipeline spec that can be set with arbitrary key value pairs to be set as annotations in the user container of the pipeline pod. We can move this annotationas field to the top level of the pipeline spect to add annotations to the pod metadata, or even add a general object metadata field to the pipeline spec (but constrain edits to the pod metadata to only labels and annotations).

Proposal

Acceptance Criteria

By adding a PodMetadata field with annotations and labels to the pipeline spec and creating a pipeline with this spec in a pachyderm cluster, one should be able to inspect the pipeline of the running cluster and confirm that the metadata from the pipeline spec is present on the pipeline's kubernetes pod.

kevindelgado commented 4 years ago

ptal @jdoliner

jdoliner commented 4 years ago

This looks good to me, with one small tweak which is that if we only support setting labels and annotations through metadata then I think we should just have it be a protobuf that only has those two fields present. So the limitations are reflected in the protobuf types, rather than enforced at runtime.

kevindelgado commented 4 years ago

Sure thing, is there any other part of the ObjectMetadata you feel would be relevant other than labels and annotations?

jdoliner commented 4 years ago

The only other metadata I think we'd like to make configurable is the Namespace but I think that's probably a bigger undertaking. We'd need to figure out how to get the the namespaced worker talking to etcd, which I think isn't too hard. But I don't want to expand the scope of this.