kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.63k stars 1.63k forks source link

S3 support in Kubeflow Pipelines #3405

Closed Jeffwan closed 8 months ago

Jeffwan commented 4 years ago

Create this parent issue to includes all known S3 problems in KFP and I also like to talk about more feature requests in this ticket.

Use cases

Replace Minio with S3 for Argo artifact store and KFP Pipeline store.

[] Manifest changes in Kubeflow/manifest and standalone kubeflow/pipeline/manifest to easily change from minio to S3 [] IRSA - bump minio-sdk-go version to support IRSA [] IRSA - bump argo workflow version to make sure runner can persist artifact to S3

Relate issues and PRs

UI

[x] KFP UI should be able to read source file in S3, for example

mlpipeline-ui-metadata -> minio://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

{
  "outputs": [
    {
      "source": "s3://your_bucket/README.md",
      "type": "markdown"
    }
  ]
}

[] KFP UI should be able to read artifact files in S3

mlpipeline-ui-metadata -> s3://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

{
  "outputs": [
    {
      "source": "s3://your_bucket/README.md",
      "type": "markdown"
    }
  ]
}

[] IRSA - Bump minio-js version to support IRSA

SDK

[x] Use can declare custom S3 artifact location inside the pipeline. [x] User can apply AWS credentials to pipeline pods to get access to S3. [x] User can specify service account for pipeline. (IRSA)

More examples

/kind improvement /area frontend /area backend

/cc @eterna2 @discordianfish

karlschriek commented 3 years ago

Is there already an issue that tracks this? I guess the current issue is probably not the right place to continue this discussion.

Bobgy commented 3 years ago

@karlschriek WDUT about https://github.com/kubeflow/pipelines/issues/5656?

RakeshRaj97 commented 3 years ago

I'm struggling to access my object storage bucket created using rook-ceph (s3) and would like to use this as my default artifact and metadata store instead of MinIO. Can anyone please share me what all files to be modified to use MinIO as s3 interface?

I've tried updating workflow-controller-configmap to use my object store

apiVersion: v1
data:
  artifactRepository: |
    archiveLogs: true
    s3:
      endpoint: "rook-ceph-rgw-my-store.rook-ceph.svc.cluster.local:80"
      bucket: "ceph-bkt-xxxx-xxxxx"
      keyFormat: "artifacts/{{workflow.name}}/{{pod.name}}"
      # insecure will disable TLS. Primarily used for minio installs not configured with TLS
      insecure: true
      accessKeySecret:
        name: mlpipeline-minio-artifact
        key: accesskey
      secretKeySecret:
        name: mlpipeline-minio-artifact
        key: secretkey
  containerRuntimeExecutor: docker

Also replace accesskey and secretkey in mlpipeline-minio-artifact secret

Any clue on this will be really helpful.

vinayan3 commented 3 years ago

@Jeffwan Did you have a PR or an example for using minio as a gateway to s3? I've been standing up Kubeflow on AWS and I'm a bit concerned about the number of places the patches that need to go in for S3. Instead of patching all these deployments I'd prefer to just patch the minio components and then be done with it.

Jeffwan commented 3 years ago

@Jeffwan Did you have a PR or an example for using minio as a gateway to s3? I've been standing up Kubeflow on AWS and I'm a bit concerned about the number of places the patches that need to go in for S3. Instead of patching all these deployments I'd prefer to just patch the minio components and then be done with it.

@vinayan3 Have you checked in this https://docs.min.io/docs/minio-gateway-for-s3.html? You can make changes in minio manifests to achieve that. minio gateway works with S3 out of box.

MaximumQuasar commented 2 years ago

Hi,

Currently is the assume-role-identity sts (temporary access) supported for the workflow to upload into s3 and is there any reference docs for the same?

By temporary access i mean with access key, secret key & security Token.

xwt-ml commented 2 years ago

Is there a time plan, when and in which release will this following feature be supported? @Jeffwan UI KFP UI should be able to read artifact files in S3 mlpipeline-ui-metadata -> s3://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

surajkota commented 1 year ago

Hi everyone, please see the proposal/design for using IRSA/IAM roles to access S3 on #8502 and let us know your thoughts.

For anyone looking to use S3 with Kubeflow pipelines(using IAM user credentials), please refer to the latest AWS distribution of Kubeflow docs - https://awslabs.github.io/kubeflow-manifests/

thesuperzapper commented 1 year ago

Hey all, I just wanted to share that deployKF uses S3 directly (no minio gateway), which also means that it comes with support for IRSA!

It's going to be documented better soon, but these are the values you need to configure to use S3, happy to help anyone who has trouble getting it working.

If you want, you can also read a bit more about deployKF in this comment: https://github.com/kubeflow/manifests/issues/2451#issuecomment-1629851394

rimolive commented 8 months ago

Closing this issue. Looks like this was already implemented.

/close

google-oss-prow[bot] commented 8 months ago

@rimolive: Closing this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/3405#issuecomment-2016820884): >Closing this issue. Looks like this was already implemented. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.