intel / vck

Volume Controller for Kubernetes
https://ai.intel.com/kubernetes-volume-controller-kvc-data-management-tailored-for-machine-learning-workloads-in-kubernetes/
Apache License 2.0
67 stars 17 forks source link

Erroneously empty directory in test pod when trying to mount IBM Cloud Object Storage #63

Closed fplk closed 6 years ago

fplk commented 6 years ago

I have just tried to mount an S3 bucket from IBM Cloud Object Storage like this:

kubectl create namespace vckns
kubectl config set-context $(kubectl config current-context) --namespace=vckns
git clone https://github.com/IntelAI/vck.git && cd vck
helm init
# Wait until kubectl get pod -n kube-system | grep tiller shows Running state
# Modify helm-charts/kube-volume-contoller/values.yaml to use valid tag from https://hub.docker.com/r/volumecontroller/kube-volume-controller/tags/
# I use tag: "df90277"
helm install helm-charts/kube-volume-controller/ -n vck --wait --set namespace=vckns
kubectl get crd
export AWS_ACCESS_KEY_ID=<aws_access_key>
export AWS_SECRET_ACCESS_KEY=<aws_secret_access_key>
kubectl create secret generic aws-creds --from-literal=awsAccessKeyID=${AWS_ACCESS_KEY_ID} --from-literal=awsSecretAccessKey=${AWS_SECRET_ACCESS_KEY}
# Looked at kubectl get volumemanager vck-example1 -o yaml to see if "state: Pending" changes or more precisely: kubectl get volumemanager vck-example1 -o jsonpath='{.status.state}'
kubectl create -f resources/customresources/s3/one-vc.yaml
# File content:
apiVersion: vck.intelai.org/v1
kind: VolumeManager
metadata:
  name: vck-example1
  namespace: vckns
spec:
  volumeConfigs:
    - id: "vol1"
      replicas: 1
      sourceType: "S3"
      accessMode: "ReadWriteOnce"
      #nodeAffinity:
      #  - <insert-node-affinity-here>
      #tolerations:
      #  - <insert-tolerations-here>
      capacity: 5Gi
      labels:
        key1: val1
        key2: val2
      options:
        endpointURL: https://s3-api.us-geo.objectstorage.softlayer.net
        awsCredentialsSecretName: aws-creds
        sourceURL: "s3://<bucket_name>/"
        # dataPath: <insert-data-path-here-optional>"
        # distributionStrategy: <insert-distributed-strategy-here-optional>

kubectl create -f resources/pods/vck-pod.yaml
# File content:
apiVersion: v1
kind: Pod
metadata:
  name: vck-claim-pod
spec:
  #affinity:
  #  <insert-node-affinity-from-cr-status>
  volumes:
    - name: dataset-claim
      hostPath:
        path: /var/datasets/vck-resource-a2140d72-11c2-11e8-8397-0a580a440340
  containers:
  - image: busybox
    command: ["/bin/sh"]
    args: ["-c", "sleep 1d"]
    name: vck-sleep
    volumeMounts:
    - mountPath: /var/data
      name: dataset-claim

When I display the bucket via

AWS_ACCESS_KEY_ID=<aws_access_key> AWS_SECRET_ACCESS_KEY=<aws_secret_access_key> aws s3 ls --endpoint-url https://s3-api.us-geo.objectstorage.softlayer.net s3://<bucket_name>/

I correctly see the bucket content, but when I exec into the pod via kubectl exec -it vck-claim-pod /bin/sh and look into the mount path with ls /var/data it is empty.

Most artifacts seem to be available except for the fact that the resource pod is in state "Completed":

kubectl get pod,crd,pvc,pv,secret
NAME                                                    READY     STATUS      RESTARTS   AGE
pod/vck-64c4945885-2wcnm                                1/1       Running     0          1h
pod/vck-claim-pod                                       1/1       Running     0          17m
pod/vck-resource-32284724-8171-11e8-8d61-0e39608b01ad   0/1       Completed   0          39m

NAME                                                                           AGE
customresourcedefinition.apiextensions.k8s.io/volumemanagers.vck.intelai.org   1h

NAME                         TYPE                                  DATA      AGE
secret/aws-creds             Opaque                                2         1h
secret/default-token-m588d   kubernetes.io/service-account-token   3         2h
secret/vck-token-tlj9t       kubernetes.io/service-account-token   3         1h

I'm currently quite busy and thus only had 10 minutes to play with vck, so there is a good chance this is not a bug, but a user error. Still: Do you have any idea why I cannot see my data?

Thank you in advance.

balajismaniam commented 6 years ago

@fplk Did you add the node affinity from the CR status while creating this pod? Which version of VCK are you deploying?

balajismaniam commented 6 years ago

@fplk do you have any updates on this?

fplk commented 6 years ago

Sorry - so little time. I think I need to add something like

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: vck.intelai.org/default-vck-example1-vol1
            operator: Exists

I'll try to revisit this this weekend, but I also have to finish two projects, so no guarantees unfortunately. Thanks for your responses.

fplk commented 6 years ago

You were absolutely right. VCK works great with IBM Cloud Object Storage once the node affinity is properly defined. Thanks so much for the support.