kubernetes-sigs / jobset

JobSet: An API for managing a group of Jobs as a unit
https://jobset.sigs.k8s.io/
Apache License 2.0
113 stars 36 forks source link

feat: add JobIndex to container env #592

Open googs1025 opened 3 weeks ago

googs1025 commented 3 weeks ago

/kind feature What would you like to be added: Currently, there are multiple labels in the jobset, and I think that some of these labels should be injected into the environment variables of the containers. For example, in a TensorFlow job, there is one parameter server (PS) and two workers. Each worker is responsible for processing a specific slice of the input data. To ensure that the workers are aware of the slices they are processing, they retrieve their respective indices through environment variables. I'm just providing an example or idea, and I'm not sure if this feature is needed.

root@VM-0-9-ubuntu:/home/ubuntu# kubectl describe pods paralleljobs-workers-0-1-gppcd
Name:             paralleljobs-workers-0-1-gppcd
Namespace:        default
Priority:         0
Service Account:  default
Node:             cluster1-worker2/172.18.0.3
Start Time:       Tue, 04 Jun 2024 13:41:20 +0800
Labels:           batch.kubernetes.io/controller-uid=2f709f71-5df9-4362-af7e-1c457d403397
                  batch.kubernetes.io/job-completion-index=1
                  batch.kubernetes.io/job-name=paralleljobs-workers-0
                  controller-uid=2f709f71-5df9-4362-af7e-1c457d403397
                  job-name=paralleljobs-workers-0
                  jobset.sigs.k8s.io/job-index=0
                  jobset.sigs.k8s.io/job-key=32d672edc82776df1fbf3120a8fc54a9192afa69
                  jobset.sigs.k8s.io/jobset-name=paralleljobs
                  jobset.sigs.k8s.io/replicatedjob-name=workers
                  jobset.sigs.k8s.io/replicatedjob-replicas=1
                  jobset.sigs.k8s.io/restart-attempt=0
Annotations:      batch.kubernetes.io/job-completion-index: 1
                  jobset.sigs.k8s.io/job-index: 0
                  jobset.sigs.k8s.io/job-key: 32d672edc82776df1fbf3120a8fc54a9192afa69
                  jobset.sigs.k8s.io/jobset-name: paralleljobs
                  jobset.sigs.k8s.io/replicatedjob-name: workers
                  jobset.sigs.k8s.io/replicatedjob-replicas: 1
                  jobset.sigs.k8s.io/restart-attempt: 0

like this:

Containers:
    ...
    Environment:
      JOBSET_INDEX:          0
      JOB_COMPLETION_INDEX:   (v1:metadata.labels['batch.kubernetes.io/job-completion-index'])
  ...
... 

Why is this needed: Enable the containers managed by the jobset to be aware of essential information.

googs1025 commented 3 weeks ago

@danielvegamyhre @kannon92 @ahg-g Is this a required feature?

googs1025 commented 3 weeks ago

/assign

kannon92 commented 3 weeks ago

This is really a K8s request and not for Jobset.

IndexedJob should expose an environment variable for you to use. JOB_COMPLETION_INDEX

googs1025 commented 3 weeks ago

This is really a K8s request and not for Jobset.

IndexedJob should expose an environment variable for you to use. JOB_COMPLETION_INDEX

@kannon92 thanks for review! Yes, I know k8s already has the JOB_COMPLETION_INDEX environment variable. But the meanings of JOB_COMPLETION_INDEX and JOBSET_INDEX seem to be different.

root@VM-0-9-ubuntu:~/jobset/examples/simple# kubectl describe pods paralleljobs-workers-1-2-fd7d6
Name:             paralleljobs-workers-1-2-fd7d6
Namespace:        default
Priority:         0
Service Account:  default
Node:             cluster1-worker/172.18.0.2
Start Time:       Tue, 04 Jun 2024 20:51:28 +0800
Labels:           batch.kubernetes.io/controller-uid=62411fea-0401-42c1-b0d7-2b15d1abdc8f
                  batch.kubernetes.io/job-completion-index=2
                  batch.kubernetes.io/job-name=paralleljobs-workers-1
                  controller-uid=62411fea-0401-42c1-b0d7-2b15d1abdc8f
                  job-name=paralleljobs-workers-1
                  jobset.sigs.k8s.io/job-index=1
                  jobset.sigs.k8s.io/job-key=4e1c31554543f8219df068ce823cff3c77b9ec8c
                  jobset.sigs.k8s.io/jobset-name=paralleljobs
                  jobset.sigs.k8s.io/replicatedjob-name=workers
                  jobset.sigs.k8s.io/replicatedjob-replicas=3
                  jobset.sigs.k8s.io/restart-attempt=0
Annotations:      batch.kubernetes.io/job-completion-index: 2
                  jobset.sigs.k8s.io/job-index: 1
                  jobset.sigs.k8s.io/job-key: 4e1c31554543f8219df068ce823cff3c77b9ec8c
                  jobset.sigs.k8s.io/jobset-name: paralleljobs
                  jobset.sigs.k8s.io/replicatedjob-name: workers
                  jobset.sigs.k8s.io/replicatedjob-replicas: 3
                  jobset.sigs.k8s.io/restart-attempt: 0

I'm not sure if I understand it correctly, please forgive me if I am wrong.

JOB_COMPLETION_INDEX : means that the Pods of a Job get an associated completion index from 0 to (.spec.completions - 1) Is the index of the job dimension JOBSET_INDEX: means the index of different jobs in ReplicatedJob. Is the index of the replicatedJob dimension

danielvegamyhre commented 3 weeks ago

@googs1025 you can use the downward API to set an environment variable to the value of a label or annotation: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

googs1025 commented 3 weeks ago

downward API

@danielvegamyhre Yes, I know I can use the downward API. I am wondering if we should put the feature of injecting some information into the container into the jobset.

googs1025 commented 3 weeks ago

In other words, since there are many jobs or pods in a jobset, do we provide a global configuration capability?

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: jobset-example
spec:
  # can configure global parameters,
  # which will take effect in every job and pod after setting.
  globalParams:
    # can add the parameters required by the container
    env:
      - name: "FOO"
        value: "bar"
      - name: "QUE"
        value: "pasa"
    # job pod annotations
    annotations:
      key1: value1
      key2: value2
    # job pod labels
    labels:
      key1: value1
      key2: value2
kannon92 commented 3 weeks ago

If you add labels/annotations to the job template I think they are sent to all downstream objects (job, pods). I don’t know if we do that for the service we create.

googs1025 commented 3 weeks ago

If you add labels/annotations to the job template I think they are sent to all downstream objects (job, pods). I don’t know if we do that for the service we create.

Yes, if we only look at the API field I have given, it is not a very good design, because many labels will be passed to each downstream workload. But I think it is necessary to have global configuration capabilities in jobset.

danielvegamyhre commented 3 weeks ago

Can you describe a specific use case for this

googs1025 commented 3 weeks ago

Can you describe a specific use case for this

Currently, I haven't encountered a specific use case. I just think that in the collaboration between multiple jobs, there might be a need to share some information (using environment variables for transmission) or receive some information from higher-level components. That's why I raised the question of whether this feature is needed to support such scenarios.

danielvegamyhre commented 3 weeks ago

Can you describe a specific use case for this

Currently, I haven't encountered a specific use case. I just think that in the collaboration between multiple jobs, there might be a need to share some information (using environment variables for transmission) or receive some information from higher-level components. That's why I raised the question of whether this feature is needed to support such scenarios.

It is an interesting idea, but making an API change and maintaining it indefinitely is a big commitment, and I only want to do that if there are specific use cases that require this.