larsks / blog.oddbit.com

3 stars 0 forks source link

post/2021-08-23-external-ocs/ #16

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Connecting OpenShift to an External Ceph Cluster · The Odd Bit

Red Hat’s [OpenShift Data Foundation][ocs] (formerly “OpenShift Container Storage”, or “OCS”) allows you to either (a) automatically set up a Ceph cluster as an application running on your OpenShift cluster, or (b) connect your OpenShift cluster to an externally managed Ceph cluster. While setting up Ceph as an OpenShift application is a relatively polished experienced, connecting to an external cluster still has some rough edges. NB I am not a Ceph expert.

https://blog.oddbit.com/post/2021-08-23-external-ocs/

larsks commented 3 years ago

Note to self: there are also python2 bugs in the script, see https://bugzilla.redhat.com/show_bug.cgi?id=1998292. Not directly relevant to the article, but this seemed like a good place to remember it since all the other bugs about the script are here.

imbezol commented 2 years ago

Firstly, thank you very much for this post. It's been very useful in setting up ODF with an external single-stack IPv6 Ceph cluster that will be used by multiple OCP clusters. I did want to mention a couple things that I found while going through this.

The official create-external-cluster-resources.py script does not handle IPv6 addresses but I was able to use your script above to get the information from the cluster with only a slight edit to handle IPv6. Line 51:

-    prom_ip, prom_port = prom_url.netloc.split(':')
+    prom_ip, prom_port = prom_url.netloc.rsplit(':',1)

When testing using the caps you've listed I found that I was able to create and use rbd volumes with no issues. However, when I tried to use a cephfs volume I ran into an issue where it could provision the volume but could not mount it and showed "MountVolume.MountDevice failed" and "Operation not permitted" in the events.

It turns out that the node needs to be able to write to the meta volume as well. Changing client.csi-cephfs-node osd cap to allow rw to the meta and data volumes fixed the issue and still restricts access to the cluster specific filesystem.

ceph auth add client.csi-cephfs-node-${clustername} \
        mgr "allow rw" \
        mds "allow rw fsname=${clustername}-fs" \
        mon "allow r fsname=${clustername}-fs" \
        osd "allow rw tag cephfs *=${clustername}-fs"
larsks commented 2 years ago

I'm glad you found it useful! I've found it useful as I've had to do the same thing on additional clusters :).

It turns out that the node needs to be able to write to the meta volume as well.

I recently set up ODF under OCP 4.10, and I've noticed that there have been several changes in the permissions the script attempts to set up, making this post more of a guideline than a complete procedure. I just grabbed a new copy of the Python script and extracted the capabilities from that.

We also discovered that at least during the install that ODF wants access to the Ceph prometheus endpoint, and won't proceed (a) if the value is unset or (b) if it can't contact the endpoint. Our target Ceph cluster doesn't expose that endpoint, so we solved that by setting up a mock prometheus endpoint to make the installation process happy.