kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.62k stars 1.63k forks source link

[backend] dsl.importer can't use with dsl.ParallelFor #10860

Open hexiaoliangRick opened 5 months ago

hexiaoliangRick commented 5 months ago

Environment

Steps to reproduce

1、in my pipeline,i try to download a list of artifact first , and then use the list of artifacts in a container component. so i use dsl.importer with with dsl.ParallelFor.

from typing import List

from kfp import compiler
from kfp import dsl
from kfp.dsl import Input, Artifact, Output

#
#
# @dsl.container_component
# def mosaic_satellite_image_gdal(raster_files: str, out_raster_file: Output[Artifact]):
#     container = dsl.ContainerSpec(
#         image='harbor.host.com/bdh/remote-sensing-data-preprocessing:v1.10',
#         command=['python', 'main.py', 'MOSAIC'],
#         args=[raster_files]
#     )
#     return container

@dsl.component()
def get_artifact_local_path(local_raster_artifacts: List[Artifact]) -> str:
    values = []
    for artifact in local_raster_artifacts:
        values.append(artifact.path)
    return ".".join(values)

@dsl.component()
def fake_op(s: str):
    print(s)

@dsl.pipeline
def raster_mosaic_pipeline() -> str:
    rasters = ["minio://wh-gis-dev/thenorth_files/2011/2024/4/1794971570411155457/lhztestShape.geojson",
               "minio://wh-gis-dev/remote-sense-image/S2MSI2A/2024/5/15/S2A_MSIL2A_20240515T024551_N0510_R132_T51UXQ_20240515T055751.SAFE.zip"]
    with dsl.ParallelFor(
            items=rasters, parallelism=10
    ) as raster_file:
        importer_file_task = dsl.importer(artifact_uri=raster_file, artifact_class=dsl.Artifact, reimport=False)
    artifacts = dsl.Collected(importer_file_task.output)
    get_op_task = get_artifact_local_path(local_raster_artifacts=artifacts)
    return get_op_task.output

compiler.Compiler().compile(raster_mosaic_pipeline, 'RasterMosaic.yaml')

Expected result

run pipeline successful.

Materials and Reference


Impacted by this bug? Give it a 👍.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gregsheremeta commented 3 months ago

@gmfrasca looks just like the error I found in https://github.com/kubeflow/pipelines/pull/10798#issuecomment-2263670451

Hmm :thinking:

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HumairAK commented 1 month ago

/remove-lifecycle stale