kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.51k stars 1.59k forks source link

Vertex AI Pipeline - Container OP `set_cpu_limit` does not work with parameter_values nor at runtime #6681

Closed SaschaHeyer closed 2 days ago

SaschaHeyer commented 2 years ago

Hello Kubeflow Team, Hello Google Team,

The container OP .set_cpu_limitonly works when the value is set explicit and not via parameter_values or at runtime https://github.com/kubeflow/pipelines/blob/4906ab2f1142043517249a62b9f22bc122971fdf/sdk/python/kfp/dsl/_container_op.py#L378

Reproduce

  1. parameter_values: see steps to reproduce
  2. runtime: see https://github.com/kubeflow/pipelines/blob/master/samples/core/resource_spec/runtime_resource_request.py

Environment

Steps to reproduce

Not working

from kfp.v2.dsl import pipeline

@pipeline(name="reproduction",
              pipeline_root="ADD PIPELINE ROOT")
def pipeline(cpu_limit: str):
    train_op = train().set_cpu_limit(cpu_limit)

compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='pipeline.json')

api_client = AIPlatformClient(
                project_id="ADD PROJECT",
                region="us-central1"
                )

response = api_client.create_run_from_job_spec(
    'pipeline.json',
      parameter_values={
          'cpu_limit': "16"
  }
)

Working

from kfp.v2.dsl import pipeline

@pipeline(name="reproduction",
              pipeline_root="ADD PIPELINE ROOT")
def pipeline():
    train_op = train().set_cpu_limit("16")

compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='pipeline.json')

api_client = AIPlatformClient(
                project_id="ADD PROJECT",
                region="us-central1"
                )

response = api_client.create_run_from_job_spec(
    'pipeline.json'
)

Expected result

The CPU limits can be set via parameter_values

Looking forward to your feedback

zijianjoy commented 2 years ago

cc @chensun

SaschaHeyer commented 2 years ago

Morning any updates?

chensun commented 2 years ago

Hi @SaschaHeyer , this is indeed a known limitation and we plan to discuss the best solution for this in Q1/Q2 2022.

Can you help us understand what's your use case to set a dynamic value for cpu limit, and how critical is this feature to you? Thanks!

SaschaHeyer commented 2 years ago

Hi @chensun Thanks a lot for your feedback.

I work for one of the biggest Google Cloud partners, we get this request regularly from our customers, at least once every 2 weeks. Parameterizing the machine type (CPU and memory) can be really useful if you use the same pipeline just for different datasets and or hyperparameters (This way there is no need to re-compile).

Changing those hyperparameters also might require bigger machines. For example, if you increase the batch size.

Currently, a re-compile of the pipeline is required. Would be useful if we could do this via parameter as well.

iuiu34 commented 2 years ago

in this line, would be nice also if when a task throws an kfp error for being out of memory, that a) you can play with the memory-limit as a parameter as SaschaHeyer request, just re-runing the task, not the whole pipeline (though if cache is enabled this maybe is already solved) b) does the upscale automatically and re-runs the task again

chensun commented 2 years ago

@SaschaHeyer Thanks for the context!

chensun commented 2 years ago

in this line, would be nice also if when a task throws an kfp error for being out of memory, that a) you can play with the memory-limit as a parameter as SaschaHeyer request, just re-runing the task, not the whole pipeline (though if cache is enabled this maybe is already solved)

Yes, caching would help here if the upstream doesn't have any changes on their inputs.

b) does the upscale automatically and re-runs the task again

This might create some surprise billing issue :)

iuiu34 commented 2 years ago

This might create some surprise billing issue :)

yep, in case implemented, there should be an autoscale: bool = False argument in the function kfp.v2.compiler.Compiler().compile But agree that option b): auto-scaling could have some dramatic problems for the user in terms of money money that option a) doesn't have.

ashrafgt commented 2 years ago

Huge +1 on this!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

iuiu34 commented 2 years ago

are plans to support this? or is explored in another ticket?

SaschaHeyer commented 1 year ago

Hi are there any updates? This would be a huge benefit for re-using pipelines without the need to re-compile them.

saigirishgilly98 commented 1 year ago

Hi are there any updates? This would be a huge benefit for re-using pipelines without the need to re-compile them.

+++

I agree with @SaschaHeyer, we are building reusable pipeline templates with only data changing and depending on the data size, we would want to be able to configure the CPU and Memory for each of the components through pipeline params or any other way.

acarvalho2-wiq commented 2 months ago

Hi guys, do we have any updates on this? I am also looking for exactly the same dynamic parameterisation of my pipeline.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 days ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

google-oss-prow[bot] commented 1 day ago

@entsarangi: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubeflow/pipelines/issues/6681#issuecomment-2238102025): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
entsarangi commented 1 day ago

This is an useful feature to have cpu_limit available via pipeline_params ? Any update or workaround that doesn't involve hard-coded values ?