vmware-archive / photon-controller

Photon Controller
Other
26 stars 3 forks source link

Persistent Disks: There is something wrong. #125

Open tactical-drone opened 7 years ago

tactical-drone commented 7 years ago

It does not work properly. It is hard to describe the issue, but here we go:

A standard persistent claim like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    io.kompose.service: cassandra-data
  name: cassandra-data
  annotations:
      volume.beta.kubernetes.io/storage-class: local-vmfs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi

Basically nothing happens. But if you deploy two of these with a script in quick succession, the last one sometimes work. But both never works. And deploying just one never works.

I trolled all the logs and there is absolutely no logging around provisioning persistent disks in photon. All I get from kubernetes is this:

Name:       cassandra-data
Namespace:  dev
StorageClass:   local-vmfs
Status:     Pending
Volume:     
Labels:     io.kompose.service=cassandra-data
Annotations:    kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"volume.beta.kubernetes.io/storage-class":"local-vmfs"},"labels":{"io.komp...
        volume.beta.kubernetes.io/storage-class=local-vmfs
        volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/photon-pd
Capacity:   
Access Modes:   
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  5m        7s      23  persistentvolume-controller         Warning     ProvisioningFailed  Failed to provision volume with StorageClass "local-vmfs": invalid_request: missing refresh_token parameter

Name:       cassandra-data-2
Namespace:  dev
StorageClass:   local-vmfs
Status:     Bound
Volume:     pvc-37582142-3f98-11e7-a16c-000c2905587e
Labels:     io.kompose.service=cassandra-data-2
Annotations:    kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"volume.beta.kubernetes.io/storage-class":"local-vmfs"},"labels":{"io.komp...
        pv.kubernetes.io/bind-completed=yes
        pv.kubernetes.io/bound-by-controller=yes
        volume.beta.kubernetes.io/storage-class=local-vmfs
        volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/photon-pd
Capacity:   1Gi
Access Modes:   RWO
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                -------------   --------    ------          -------
  5m        2m      11  persistentvolume-controller         Warning     ProvisioningFailed  Failed to provision volume with StorageClass "local-vmfs": invalid_request: missing refresh_token parameter
  2m        2m      1   persistentvolume-controller         Normal      ProvisioningSucceeded   Successfully provisioned volume pvc-37582142-3f98-11e7-a16c-000c2905587e using kubernetes.io/photon-pd

Notice how second one (-2) of the two worked. But not both. So everything is set up correctly. All I get is this all the time:

ProvisioningFailed Failed to provision volume with StorageClass "local-vmfs": invalid_request: missing refresh_token parameter

With kubernetes telling me the status is pending. It never completes.

tactical-drone commented 7 years ago

image

I cant deploy cassandra with that double trick though because the stateful set contains a claim template that it applies for each replica. So because kubernetes is creating these claims I cannot use my double script trick to make these claims work.

This is a highly frustrating bug, because its sort off working and I am so behind already with this and now I am totally dead in the water.

tactical-drone commented 7 years ago

I kind of got it working by constantly just giving my claims new names if the old ones fail. Eventually it just works.

But now I have a different problem. The pod cannot mount the volume. Looking at the logs it seems that the pod is floating all over the place but the persistent was created on one host only.

Why wont the pod go to the host that has the drive? Can this be overcome using selectors? stateful sets and selectors is not well documented.

It could be some provisioning issue, but I set all my quotas to sky high. Unless there is some mechanism that checks the actual usage. But still then, I have 4 servers with 10TB and 600Gb memory combined!

tactical-drone commented 7 years ago

Or maybe it is because I cannot place my workers on ESXi. Now photon places them all on one host but then provisions the disks on a completely different ESXi host.

This thing needs some serious rethink. I don't think I can go to production in one week's time with these issues. I'm so dead.

schadr commented 7 years ago

currently you would need to place all your workers and persistent disk on shared storage for this to work.

meaning at this point photon-controller is not moving disks around such that you can easily attach disks that were created on a different host than the vm lives.

AlainRoy commented 7 years ago

The storage class for the persistent disk is local-vmfs. That will map to an a flavor on Photon Controller that will target local datastores. You want to pick a storage class that maps to a flavor that will target shared storage.

My notes list the following storage classes with associated flavors:

kubectl get storageclasses -o=custom-columns=name:metadata.name,flavor:parameters.flavor
name          flavor
default       service-generic-persistent-disk
local-vmfs    service-local-vmfs-persistent-disk
nfs           service-nfs-persistent-disk
shared-vmfs   service-shared-vmfs-persistent-disk
vsan          service-vsan-persistent-disk
tactical-drone commented 7 years ago

@schadr Is it not just supposed to either drop the VM next to where it dropped the drive? Surely it has all the information it needs to make this happen? Is this a kubernetes shortcoming or are persistend disks just totally bleeding edge in photon-controller (and kubernetes?) and the resource management part has not been thought out yet?

@AlainRoy This photon-controller kubernetes solution is completely useless if you cant do local persisted managed storage. You don't run Cassandra storage on anything else than hardware and then not even traditional drives if you can help it. SSDs are recommended. What am I missing here? Shared storage would be dog slow.

In any case, kuberntes proposes 3 ways to create a kubernetes cluster. The one I chose was through stateful set but it seems that is way to bleeding edge to make work. I will try some other methods then.

Thanks for the tips. I would really love to know how far the next release is? I want to determine how much time & effort I must spend in my hacks.

AlainRoy commented 7 years ago

We don't have a date for the next release yet--planning is in progress.