kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[bug] Idempotency in kubeflow pipeline sagemaker component. #7040

Closed goswamig closed 5 months ago

goswamig commented 2 years ago

What steps did you take

If node scales/up down, the sagemaker component tries to create the same job which fails. Since sagemaker does not let create the same name job. Component controller should be able to detect this and resume the job from existing state.

What happened:

the job hangs/fail

What did you expect to happen:

I expect the job to resume from previous state.

Environment:

kfp-1.6

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

goswamig commented 2 years ago

@akartsky @surajkota @mbaijal @ryansteakley FYI.

goswamig commented 2 years ago

https://github.com/kubeflow/pipelines/issues/6465

akartsky commented 2 years ago

/area components/aws/sagemaker

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 5 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.