kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.61k stars 1.63k forks source link

Kubeflow Pipelines without Google Cloud Storage #3453

Closed tech348712013870132 closed 4 years ago

tech348712013870132 commented 4 years ago

Is it possible to replace the usage of Google Cloud Storage buckets with an alternative on-premise solution so that it is possible to run e.g. Kubeflow Pipelines completely in-depended from the Google Cloud Platform?

Ark-kun commented 4 years ago

Are you sure Kubeflow Pipelines uses GCS buckets by default? As far as I know, KFP is platform-independent. Can you please provide specific examples?

tech348712013870132 commented 4 years ago

I mean e.g. at this point (screenshot) if you intend to create a new run of a pipeline. The default value of the pipeline-root input field is gs://your-bucket/...

Is it possible to use another storage backend at this point? I would prefer something open-source + on-premise host-able, independent of GCS..

Screenshot:

Screenshot 2020-04-07 at 02 48 43
numerology commented 4 years ago

Is it possible to use another storage backend at this point?

The storage backend is not part of KFP system. It depends on how the business logic is written for the components/samples.

The pipeline you mentioned here, for example, is the TFX taxi pipeline. Under the hood it uses tf.io.gfile module therefore IMO it should be able to run on other storage backend (S3 etc.) but I did not validate myself.

Briefly speaking, the IO logic within the component, which determines the types of storage backend supported, is transparent to KFP itself.

tech348712013870132 commented 4 years ago

@numerology Thank you for your very great answer.

Know I understand this better :)

Bobgy commented 4 years ago

/close Looks like question answered.

Do we need to add more documentation about this? If yes, please reopen

k8s-ci-robot commented 4 years ago

@Bobgy: Closing this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/3453#issuecomment-613837727): >/close >Looks like question answered. > >Do we need to add more documentation about this? If yes, please reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.