Closed tech348712013870132 closed 4 years ago
Are you sure Kubeflow Pipelines uses GCS buckets by default? As far as I know, KFP is platform-independent. Can you please provide specific examples?
I mean e.g. at this point (screenshot) if you intend to create a new run of a pipeline. The default value of the pipeline-root input field is gs://your-bucket/...
Is it possible to use another storage backend at this point? I would prefer something open-source + on-premise host-able, independent of GCS..
Screenshot:
Is it possible to use another storage backend at this point?
The storage backend is not part of KFP system. It depends on how the business logic is written for the components/samples.
The pipeline you mentioned here, for example, is the TFX taxi pipeline. Under the hood it uses tf.io.gfile module therefore IMO it should be able to run on other storage backend (S3 etc.) but I did not validate myself.
Briefly speaking, the IO logic within the component, which determines the types of storage backend supported, is transparent to KFP itself.
@numerology Thank you for your very great answer.
Know I understand this better :)
/close Looks like question answered.
Do we need to add more documentation about this? If yes, please reopen
@Bobgy: Closing this issue.
Is it possible to replace the usage of Google Cloud Storage buckets with an alternative on-premise solution so that it is possible to run e.g. Kubeflow Pipelines completely in-depended from the Google Cloud Platform?