kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[sdk] $$ is transformed into $ #10370

Closed martina-zxy closed 3 months ago

martina-zxy commented 9 months ago

Environment

Steps to reproduce

from kfp import dsl

@dsl.component
def print_string(input_string: str):
    import logging
    logger = logging.getLogger(__name__)
    internal_string = "internal string with $$"
    logger.info("Input string: %s", input_string)
    logger.info("Internal string: %s", internal_string)

@dsl.pipeline(
    name="test",
    description="test",
)
def pipeline_dollar_string():
    print_string(
        input_string="input string with $$"
    )

def compile_pipeline(func: Callable, file_name: str) -> None:
    """compiles the pipeline and writes it to json.

    Args:
        func (Callable): the pipeline function, must be a valid v2 pipeline
        file_name (str): out file name
    """
    compiler.Compiler().compile(func, file_name)

compile_pipeline(
    pipeline_dollar_string,
    "pipeline_dollar_string.json",
)

Logged:

Input string: input string with $
Internal string: internal string with $

Expected result

Input string: input string with $$
Internal string: internal string with $$

Materials and Reference


Impacted by this bug? Give it a 👍.

mikeedjones commented 9 months ago

Result of the lightweight python component being passed to the pipeline in shell commands.

Current workaround is to use unicode.

Also affects the content of input parameters which are also written into the pipeline proto.

zijianjoy commented 9 months ago

Hello @martina-zxy , can you share the pipeline template generated from this pipeline definition? It is likely that python code is passed to shell command that was interpreted differently in kubernetes manifest.

martina-zxy commented 9 months ago

Hi @zijianjoy, sure, I think that's right.

{
  "components": {
    "comp-print-string": {
      "executorLabel": "exec-print-string",
      "inputDefinitions": {
        "parameters": {
          "input_string": {
            "parameterType": "STRING"
          }
        }
      }
    }
  },
  "deploymentSpec": {
    "executors": {
      "exec-print-string": {
        "container": {
          "args": [
            "--executor_input",
            "{{$}}",
            "--function_to_execute",
            "print_string"
          ],
          "command": [
            "sh",
            "-c",
            "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location 'kfp==2.4.0' '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"$0\" \"$@\"\n",
            "sh",
            "-ec",
            "program_path=$(mktemp -d)\n\nprintf \"%s\" \"$0\" > \"$program_path/ephemeral_component.py\"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         \"$program_path/ephemeral_component.py\"                         \"$@\"\n",
            "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import *\n\ndef print_string(input_string: str):\n    import logging\n    logger = logging.getLogger(__name__)\n    internal_string = \"internal string with $$\"\n    logger.info(\"Input string: %s\", input_string)\n    logger.info(\"Internal string: %s\", internal_string)\n\n"
          ],
          "image": "python:3.7"
        }
      }
    }
  },
  "pipelineInfo": {
    "name": "print-string"
  },
  "root": {
    "dag": {
      "tasks": {
        "print-string": {
          "cachingOptions": {
            "enableCache": true
          },
          "componentRef": {
            "name": "comp-print-string"
          },
          "inputs": {
            "parameters": {
              "input_string": {
                "componentInputParameter": "input_string"
              }
            }
          },
          "taskInfo": {
            "name": "print-string"
          }
        }
      }
    },
    "inputDefinitions": {
      "parameters": {
        "input_string": {
          "parameterType": "STRING"
        }
      }
    }
  },
  "schemaVersion": "2.1.0",
  "sdkVersion": "kfp-2.4.0"
}
zijianjoy commented 9 months ago

Thanks @martina-zxy , I inspected the generated argo-workflow template, it still contains $$. However, when a pod is generated by argo, it has the following value in the command field:

    - --executor_input
    - '{"inputs":{"parameterValues":{"input_string":"i am $"}},"outputs":{"outputFile":"/tmp/kfp_outputs/output_metadata.json"}}'

I am suspecting it is an argo-workflow issue, @chensun how do you think?

mikeedjones commented 9 months ago

Any update on this?

rimolive commented 7 months ago

cc @terrytangyuan

mikeedjones commented 6 months ago

Looks resolved in https://github.com/kubeflow/pipelines/releases/tag/sdk-2.5.0 ?

DanElias commented 6 months ago

This seems to be resolved in kfp == 2.5.0, please try upgrading

The other dependency versions remain the same: kfp-pipeline-spec==0.2.2 kfp-server-api ==2.0.5

mikeedjones commented 6 months ago

Any clue why?

DanElias commented 6 months ago

When adding a description to the pipeline, the $$ are again changed to $ However, updating to kfp == 2.7.0 resolved the issue

mikeedjones commented 6 months ago

that is... werid. Think an RCA is in order if a description is impacting the parsing code? Does it matter what the description is?

Lightweight python components are not fully supported in all use cases and best practice is to use containerized - does this behaviour appear in containerized components?

rimolive commented 5 months ago

Do we have a confirmation that this issue is fixed in KFP SDK >= 2.5.0?

mikeedjones commented 5 months ago

it is, unless you add a description to the pipeline. It seems fixed for KFP SKD >= 2.7.0 @martina-zxy ?

martina-zxy commented 5 months ago

it is, unless you add a description to the pipeline. It seems fixed for KFP SKD >= 2.7.0 @martina-zxy ?

@rimolive @mikeedjones yes this is correct. With a description, KFP SDK>=2.7.0 keeps $$. It would be good to understand why though

rimolive commented 5 months ago

This looks to me that there's a special case that makes this work, which is more a workaround than the actual fix.

cc @connor-mccarthy What do you think? Can we consider this fixed or we should investigate further to ensure this is fixed?

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.