kubernetes-retired / external-storage

[EOL] External storage plugins, provisioners, and helper libraries
Apache License 2.0
2.7k stars 1.6k forks source link

CEPH RBD Provisioner Creates PV that fails to attach #1256

Closed davesargrad closed 4 years ago

davesargrad commented 4 years ago

I have setup a CEPH cluster. Independent of that I have followed processes found online to setup RBD provisioning.

The process I've followed is found here: https://medium.com/velotio-perspectives/an-innovators-guide-to-kubernetes-storage-using-ceph-a4b919f4e469

This has largely worked. I have a "fast-rbd" storage class as follows: image

I've created the various resources required for provisioning: image

I've created the secrets to access ceph properly. image

I am able to create PVC's that successfully bind to the PV created by the provisioner

image

However, the pod that I create fails to attach to the volume: image

Looking at the pod in more detail I see the following image

Googling the warning:

fail to check rbd image status with (executable file not found in $path)

I find various hits online including this one

It would seem that others have struggled with this. I dont fully understand the resolution described here. I am looking for guidance relative to getting past this problem

I believe the problem is that the running container does not include "rbd" in its path. Its not clear to me how this is properly resolved.

Guidance/Advice would be appreciated. Dave

davesargrad commented 4 years ago

Since I dont know how to solve the above problem yet, I am trying a CephFS provisioner instead of an RBD provisioner.

The PVC is not even binding: image

Here is my cephfs storageclass image

The corresponding claim: image

The provisioner and other resources image

and the secrets: image

I'm just not sure how to debug this.

image

How do I determine why the ceph PV is not being provisioned?

kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: cephfs provisioner: ceph.com/cephfs parameters: monitors: togo.corp.sensis.com:6789 adminId: admin adminSecretName: ceph-secret-admin adminSecretNamespace: cephfs claimRoot: /pvc-volumes

davesargrad commented 4 years ago

The yaml I use to create resources: `kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: cephfs-provisioner namespace: cephfs rules:

wongma7 commented 4 years ago

Your node needs to have the rbd binary installed and in $PATH.

davesargrad commented 4 years ago

Your node needs to have the rbd binary installed and in $PATH.

Hi @wongma7 Thanks for the reply.

Which node are you referring to? Are you saying that the K8S worker node needs this? This seems a bit odd. It seems to place a burden on the configuration of a K8S worker node that is specific to RBD.

On a different note, can you take a look at my comment above ("Since I dont know how to solve the above problem yet, I am trying a CephFS provisioner instead of an RBD provisioner").

I am trying an alternative use of CEPHFS, rather than RBD. For some reason the CEPHFS provisioner fails to create a persistent volume.

I was wondering how the provisioner knows which pool to use. With the RBD provisioner, I explicitly specify a pool ("pool: kube"). However with the CEPHFS provisioner I only specify a claimRoot "/pvc-volumes". Its not clear to me how this root is mapped to a CEPH resource.

davesargrad commented 4 years ago

I'll write up my question about CEPHFS on a separate issue. I dont want it to get lost here. I'll keep this one focused on RBD.

davesargrad commented 4 years ago

Wow. I got RBD working. On Centos all i needed to do was yum install centos-common

This placed the rbd binary onto the k8s platform.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

hellofuturecyj commented 4 years ago

do not close this issue, for it has not been fixed yet

kifeo commented 4 years ago

Hi, I had the same issue. on debian installing the ceph-common package on the node resolved the issue