Closed caetano-colin closed 1 month ago
/cc @zijianjoy @chensun @connor-mccarthy @james-jwu
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Feature Area
/area sdk /area components
What feature would you like to see?
Support for more recent apache-beam versions on Google Cloud Dataflow Component (https://cloud.google.com/vertex-ai/docs/pipelines/dataflow-component)
What is the use case or pain point?
Currently, the apache beam version being used for the google cloud pipeline component is 2.50.0, which Google Cloud Dataflow will deprecate on August 30, 2024 and has known issues (https://cloud.google.com/dataflow/docs/support/sdk-version-support-status).
The dockerfile for the image
gcr.io/ml-pipeline/google-cloud-pipeline-components:2.15.0
seems to be: https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/Dockerfile#L38Is there a workaround currently?
DataflowPythonJobOp
does not seem to have a field for replacing custom images.There is a field for passing a
requirements.txt
file, which would probably work if the container running it has network access. However, on secure/isolated environments, where the docker images must have been previously built, the container would not have access to the PyPi repository, therefore it will not be able to download packages specified in that file. In that case, the user would have no choice but to use 2.50.0 version.Love this idea? Give it a 👍.