kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.51k stars 1.58k forks source link

[backend] Unable to create directory in Minio when using Artifacts: Permission denied #10397

Open jmaunon opened 5 months ago

jmaunon commented 5 months ago

Hi Developers

I have tried to create a simple pipeline using and transfering data using "built-in" artifacts approach without success. Difficult to say what is hapenning but I have found similar issues in other threads.

Please, if you know a manual patch, let us know. I see artifacts a core solution/approach.

cc: @juliusvonkohout , @chensun

I am aware that there are some issues related, but I do not see a final solution or alternative patch. See: #6530 , https://github.com/kubeflow/manifests/issues/2573, https://github.com/kubeflow/pipelines/issues/7629

Environment

Steps to reproduce

I get a permission denied error when using Artifacts.

Snippet of code:

@dsl.component(base_image="kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0")
def download_data(test_path: Output[Dataset]):

    import torch

    from torchvision.transforms import ToTensor
    from torchvision.datasets import MNIST

    mnist_test  = MNIST(".", download=True, train=False, transform=ToTensor())

    with open(test_path.path, "wb") as f:
        torch.save(mnist_test,f)

@dsl.pipeline(
    name='mnist',
    description='Detect digits',
)
def run():
    step_1 = download_data()

client.create_run_from_pipeline_func(run)

Associated logs:

failed to execute component: unable to create directory "/minio/mlpipeline/v2/artifacts/mnist/43f760f9-b638-4129-87fe-602e24076beb/download-data" for output artifact "test_path": mkdir /minio: permission denied

Expected result

Work without issus

Materials and Reference


Impacted by this bug? Give it a 👍.

juliusvonkohout commented 5 months ago

Please use the final 1.8 image, not jupyter-pytorch-full:v1.8.0-rc.0 and join the biweekly KFP meeting to discuss this.

juliusvonkohout commented 5 months ago

You should also try to update from KFP 2.0.3 to 2.0.5 first.

jmaunon commented 5 months ago

Thans for the reply @juliusvonkohout . I write here my findings:

For any readers, I did not understand the explanation of #6530 but:

juliusvonkohout commented 5 months ago

@rimolive this might be something to track for 1.9

zijianjoy commented 5 months ago

/assign @juliusvonkohout

rimolive commented 4 months ago

We have an open PR for that #10538.

rimolive commented 4 months ago

/assign @gregsheremeta

google-oss-prow[bot] commented 4 months ago

@rimolive: GitHub didn't allow me to assign the following users: gregsheremeta.

Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to [this](https://github.com/kubeflow/pipelines/issues/10397#issuecomment-1981558536): >/assign @gregsheremeta Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

majuss commented 1 month ago

/reopen This issue still persists

google-oss-prow[bot] commented 1 month ago

@majuss: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubeflow/pipelines/issues/10397#issuecomment-2134629544): >/reopen >This issue still persists Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
rimolive commented 1 month ago

/reopen

google-oss-prow[bot] commented 1 month ago

@rimolive: Reopened this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/10397#issuecomment-2135113522): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
thesuperzapper commented 1 month ago

This issue is actually because Kubeflow Pipelines requires that component containers run as root, the container you have chosen kubeflownotebookswg/jupyter-pytorch-full:v1.8.0-rc.0 runs as non-root.

There is a PR to fix this issue by mounting emptyDir volumes at the /minio and other paths, but that will need to be reviewed:

@chensun @HumairAK @Tomcli we definitely need to prioritize fixing this issue, because it's pretty bad to have a hard requirement on root container images.

thesuperzapper commented 1 month ago

I also want to say that the lack of securityContext support is related to this, because if we had it, it would provide a possible workaround:

That is, if users could set the Pod securityContext, they could set runAsUser: 0 to override the UID of images which don't run as root by default.

droctothorpe commented 1 month ago

We're running into this now. All our end user containers run as non-root to optimize security. This is a pretty universal expectation at any security sensitive company.

droctothorpe commented 1 month ago

For anyone else running into this, we found a short-term workaround using kyverno that's not contingent on this PR being merged. Huge shout out to @moorthy156 for implementing it lightning fast. Just update the mountPath to minio or gcs or whatever else you need it to be.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-volume-mount-pipelineroot
spec:
  background: true
  failurePolicy: Ignore
  rules:
  - match:
      any:
      - resources:
          kinds:
          - Pod
          namespaceSelector:
            matchLabels:
              app.kubernetes.io/part-of: "kubeflow-profile"
          selector:
            matchExpressions:
            - key: pipelines.kubeflow.org/v2_component
              operator: In
              values:
              - "true"
    mutate:
      patchStrategicMerge:
        spec:
          volumes:
          - name: pipelineroot
          containers:
          - (name): main | wait
            volumeMounts:
            - mountPath: /s3
              name: pipelineroot
            env:
            - name: AWS_REGION
              value: us-east-1
    name: add-volume-mount-pipelineroot
    preconditions:
      all:
      - key: '{{ request.operation }}'
        operator: Equals
        value: CREATE
thesuperzapper commented 1 month ago

Just wanted to update everyone that there is a new PR being worked on that will fix this issue: