Closed strangiato closed 1 week ago
Thanks Trevor. This is now resolved, and we'll do a patch release on 2.10 to pull in these changes.
Could it be that the same issue applies also to CPU requests and limits? After updating to kfp SDK 2.10, we saw pods running with no resources on kfp server 2.3. It seems that in the generated pipeline spec, the keys for CPU / RAM requests and limits have changed as well (probably in https://github.com/kubeflow/pipelines/pull/11097).
Could it be that the same issue applies also to CPU requests and limits?
yep. Good catch. https://github.com/kubeflow/pipelines/commit/83dcf1a60919f5bcc0c644c8fdff94ad686cad07
Any interest in submitting a fix? You can use https://github.com/kubeflow/pipelines/pull/11373/files as an example.
I took a brief look at this, unfortunately it seems to be not that simple as bringing back the old fields as well. Also the datatype has changed from numbers to strings, so one would potentially also need to bring back the old validation / conversion logic as well I believe. I'm afraid I won't have enough time to work on this, also I lack some of the context on why these things were changed and don't want to break something else downstream.
I opened https://github.com/kubeflow/pipelines/issues/11390 as a follow-up of the CPU/Memory requests/limits, I'll see if I work on a fix very soon.
I took a brief look at this
@vanHavel thanks for trying! I appreciate it :smile:
When executing or compiling a pipeline using the 2.10 kfp sdk with the following configuration:
The pipeline server ignores the gpu option and is scheduled without the gpu in the resource configuration.
This appears to be a breaking change introduced in 2.10
Environment
How do you deploy Kubeflow Pipelines (KFP)? Red Hat OpenShift AI
KFP version:
KFP SDK version:
Steps to reproduce
Create a python virtual environment
Install kfp 2.10
Create the following pipeline with the file name
acc-test.py
@dsl.component() def empty_component(): pass
@dsl.pipeline(name='pipeline-accel') def pipeline_accel(): task = empty_component() task.set_accelerator_type("nvidia.com/gpu").set_accelerator_limit("1")
if name == "main": compiler.Compiler().compile(pipeline_accel, 'pipeline.yaml')
python acc-test.py
resources: limits: nvidia.com/gpu: 1 requests: nvidia.com/gpu: 1