kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.52k stars 1.59k forks source link

[feature] Set artifact name when using kfp dsl.importer #7541

Open JeremyKeusters opened 2 years ago

JeremyKeusters commented 2 years ago

Feature Area

/area components

What feature would you like to see?

I opened a question on Stack Overflow about this, but since no-one answered on there, I figured that this is probably something that is not possible yet and thus rather a feature request.

When importing an artifact using the kfp dsl.importer() function, the imported artifact gets the default (display) name artifact. I would like to give it a custom name to make the pipeline and lineage tracking more clear.

Example code dsl.importer():

    load_dataset_step = dsl.importer(
        artifact_uri=input_data_uri,
        artifact_class=dsl.Dataset,
        reimport=False
    ).set_display_name("Load Dataset")

Visualisation of the dsl.importer() step: b4Qx6

What is the use case or pain point?

Giving a custom name to the imported artifact would make the pipeline and lineage tracking more clear.

Is there a workaround currently?

I guess the workaround would be to use a custom component to load the artifact, which would allow you to edit the output artifact name.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

connor-mccarthy commented 2 years ago

/assign @chensun

This will touch at least the SDK and perhaps the backend as well. More info is needed.

adhaene-noimos commented 1 year ago

@connor-mccarthy Any updates on this?

Might this be the reason that Artifacts are duplicated when imported as mentioned in #7861?

Is there a work-around?

GfxKai commented 11 months ago

+1 on this!

Not only is the artifact name just set to artifact but the importer component itself is just given the name importer[-N]

This makes it hard for people to understand what's going on in the pipeline at a glance:

image

Component name

Looking at importer_node.py, it seems trivial to set a dynamic component name by adding a task_name: str = 'importer' function arg:

component_spec = structures.ComponentSpec(
-    name='importer',
+    name=task_name,

Artifact name

In importer_node.py, the artifact output key is hard-coded outside of the function declaration:

OUTPUT_KEY = 'artifact'

Instead, it could also be passed as a function argument, e.g. artifact_key: str = 'artifact'

    component_spec = structures.ComponentSpec(
        ...
        outputs={
-            OUTPUT_KEY:
+            artifact_key:
                structures.OutputSpec(
                    type=type_utils.create_bundled_artifact_type(
                        artifact_class.schema_title,
                        artifact_class.schema_version))
        },
    )

Suggested solution

Update the dsl.importer function signature in a backwards-compatible manner by adding two new optional parameters, with default values set to mirror the existing implementation:


def importer(
    artifact_uri: Union[pipeline_channel.PipelineParameterChannel, str],
    artifact_class: Type[artifact_types.Artifact],
+   artifact_key: str = 'artifact',
+   task_name: str = 'importer',
    reimport: bool = False,
    metadata: Optional[Mapping[str, Any]] = None,
) -> pipeline_task.PipelineTask:
github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

JeremyKeusters commented 2 months ago

Bumping to avoid the closing of this issue before a fix has been implemented.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

JeremyKeusters commented 3 weeks ago

Bump

JeremyKeusters commented 3 weeks ago

Can someone else bump this as my bump doesn't seem to be working? 😅