Open vanHavel opened 4 months ago
The problem seems to be in backend/src/v2/compiler/argocompiler/container.go:addContainerExecutorTemplate function (see: https://github.com/kubeflow/pipelines/blob/2.2.0/backend/src/v2/compiler/argocompiler/container.go#L197).
This function is supposed to add task's pod metadata to the template which it creates (see lines 279-285). However, this logic is actually called only during the first execution of the function. This is due to caching that takes place in lines 199-204 and 286:
nameContainerExecutor := "system-container-executor"
nameContainerImpl := "system-container-impl"
_, ok := c.templates[nameContainerExecutor]
if ok {
return nameContainerExecutor
}
...
c.templates[nameContainerImpl] = executor
For the very first task (which due to SDK logic happens to be determined by the alphabetical order of components names), a template is created, task's metadata are added to it, and then the template is cached in c.templates
map. However as the same key in this map is used for every task (i.e. "system-container-executor"), all other calls to the function returns the template created in the first execution. As a result, pod metadata of tasks other than the first one are ignored, while all tasks' pods receive the metadata defined for the first task.
I wonder if the minimal solution for this problem would be as easy as including the task/comp name (i.e. refName
) in the template name, like so:
nameContainerExecutor := "system-container-executor-" + refName
nameContainerImpl := "system-container-impl-" + refName
I understand that such solution (if it worked) would have the following two consequences:
Any workaround for this? We are facing the same issue!
I could not find any. And frankly, the current code doesn't offer any hope that a workaround exists.
Unless you are ok with all tasks in your pipeline having the same label(s). Then simply use add_pod_label on the very first task in alphabetical order based on components names.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Commenting as this issue is still relevant
Environment
Steps to reproduce
Here is a minimal example pipeline.
This is the generated pipeline and platform spec, which looks correct.
Click to expand
``` pipeline_spec: components: comp-a-op: executorLabel: exec-a-op outputDefinitions: parameters: Output: parameterType: NUMBER_INTEGER comp-b-op: executorLabel: exec-b-op outputDefinitions: parameters: Output: parameterType: NUMBER_INTEGER deploymentSpec: executors: exec-a-op: container: args: - '--executor_input' - '{{$}}' - '--function_to_execute' - a_op command: - sh - '-c' - > if ! [ -x "$(command -v pip)" ]; then python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip fi PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.8.0' '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<"3.9"' && "$0" "$@" - sh - '-ec' - > program_path=$(mktemp -d) printf "%s" "$0" > "$program_path/ephemeral_component.py" _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" - |+ import kfp from kfp import dsl from kfp.dsl import * from typing import * def a_op() -> int: return 1 image: 'python:3.8' exec-b-op: container: args: - '--executor_input' - '{{$}}' - '--function_to_execute' - b_op command: - sh - '-c' - > if ! [ -x "$(command -v pip)" ]; then python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip fi PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.8.0' '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<"3.9"' && "$0" "$@" - sh - '-ec' - > program_path=$(mktemp -d) printf "%s" "$0" > "$program_path/ephemeral_component.py" _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@" - |+ import kfp from kfp import dsl from kfp.dsl import * from typing import * def b_op() -> int: return 2 image: 'python:3.8' pipelineInfo: name: pipe root: dag: tasks: a-op: cachingOptions: enableCache: true componentRef: name: comp-a-op taskInfo: name: a-op b-op: cachingOptions: enableCache: true componentRef: name: comp-b-op taskInfo: name: b-op schemaVersion: 2.1.0 sdkVersion: kfp-2.8.0 platform_spec: platforms: kubernetes: deploymentSpec: executors: exec-a-op: podMetadata: annotations: task: a labels: task: a exec-b-op: podMetadata: annotations: task: b labels: task: b ```However, both pods for the two pipeline tasks get the label and annotation
a
.I did some more testing, and it seems that for all tasks, the platform spec of the one that is alphabetically first is taken. If you rename the first task to
c_op
, both pods get the label / annotationb
. This might be because the first platform spec in the YAML is always taken.Expected result
The pod metadata is set correctly for all tasks. In the above example, task
a
should have label / annotationa
, taskb
should have label / annotationb
.Materials and Reference
The pull request that introduced the feature to set labels / annotations: https://github.com/kubeflow/pipelines/pull/10393/
I am not too familiar with the codebase, but it might have something to do with not picking the correct
KubernetesExecutorConfig
.Impacted by this bug? Give it a 👍.