skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.47k stars 460 forks source link

[Storage][k8s] Support mounting data sources on Kubernetes cluster #2497

Open romilbhardwaj opened 1 year ago

romilbhardwaj commented 1 year ago

K8s users often have data locally stored on the cluster, typically in the form of a persistent volume + pvc, or a NFS mounted to all nodes, accessed via a hostpath.

From a user:

We have to use the hostpath instead because we cannot transfer the whole datasets directory to the pvc. I want the option to add the pvc or hostpath to the task.yaml.

Our k8s features needs to add support for attaching and accessing data on persistent volumes, hostpaths and/or any other mechanism used by users to access their local data.

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 8 months ago

This issue was closed because it has been stalled for 10 days with no activity.

romilbhardwaj commented 6 months ago

We now have a temporary workaround for this.

Users can edit ~/.sky/config.yaml to specify custom volumes and volume mounts to attach to their pods (docs):

  pod_config:
    spec:
      containers:
        - volumeMounts:       # Custom volume mounts for the pod
            - mountPath: /foo
              name: example-volume
              readOnly: true
      volumes:
        - name: example-volume
          hostPath:
            path: /tmp
            type: Directory

Note that is not an ideal solution since config.yaml would apply globally to all pods, and selecting volume mounts on a per-pod basis is not possible through this mechanism.

Michaelvll commented 2 months ago

With #3689, we are able to set volume mount per task in the task yaml.