lsst-uk / somerville-operations

User issue reporting and tracking for the Somerville Cloud
0 stars 0 forks source link

Error mounting ceph share #187

Closed millingw closed 2 months ago

millingw commented 3 months ago

I've created a ceph share in the somerville-jade project, added an access credential for it, then tried to mount the share in a VM within the same project, having installed ceph-common on the VM and added the access key to ceph.secret

I ran the following command:

# sudo mount -vt ceph -o secretfile=ceph.secret,name=malcolm 10.19.4.17:6
789,10.19.4.16:6789,10.19.4.18:6789:/volumes/_nogroup/4f46d693-c16a-4063-95d5-f453734ebb86/46da8902-9
bf8-4f33-a16e-d084b27caed0 /mnt/ceph_test
parsing options: rw,secretfile=ceph.secret,name=malcolm
mount.ceph: options "name=malcolm" will pass to kernel.
mount error: no mds server is up or the cluster is laggy

I'm not quite sure where I've gone wrong - possibly networking? I assumed that if the VM and share are within the same project there wouldn't be additional networking steps to configure

GregBlow commented 3 months ago

this resembles an error I've seen when there is a ceph user mismatch.

@DP-B21 this resembles the problem you were seeing on the EIDF system, I believe. Can I add you to this ticket to take a look, please, if I don't manage to resolve it tomorrow?

GregBlow commented 3 months ago

Unfortunately couldn't get to this today.

@DP-B21 Not sure if you are familair with Manila and cephfs. If not, could you give it a go, look over the documentation (e.g. https://docs.openstack.org/manila/latest/admin/shared-file-systems-crud-share.html, you can create shares from the Horizon web gui under project>share>shares) and try to see if you can mount it on a test instance please?

DP-B21 commented 3 months ago

Hi @GregBlow after looking at it I believe it may be a networking issue since export locations gives us:

path = 10.19.4.17:6789,10.19.4.16:6789,10.19.4.18:6789:/volumes/_nogroup/06ed9f64-0806-411b-8dfe-81a36fa1a8b5/aaf96b18-8ac1-44af-8e19-f64d3d1db969

when the network that the VM is in is 10.10.0.0

millingw commented 2 months ago

Is this a mistake on my part, or something that needs to be further investigated?

GregBlow commented 2 months ago

Hello @millingw , could you clarify which VM is the subject of this issue? Apologies, but I've gone back over the ticket and can't see where is was identified.

I can see that one of the VMs in the gaia project (bob-test-1) has an interface on the CephFS network, however none of the others do. Though the share is managed through the same project, the CephFS has it's own network.

You should be able to add such an interface directly to the instance in question through openstack to check, though depending on the image you've used it may take some work to configure it correctly.

millingw commented 2 months ago

I was trying from the gaia_dataset_one VM. I'll try again with the ceph network explicitly attached to the VM. One thing that puzzles me slightly is that I was apparently able to attach a shared ceph volume ("gaia_dataset_shared_volume") without explicitly being on the ceph network.

GregBlow commented 2 months ago

At this point it is worth drawing the distinction between Ceph and CephFS. It sounds like attaching a shared ceph volume uses ceph block storage, mediated by Cinder (an openstack service), whereas CephFS is a POSIX-compliant file system, mediated by Manila.

https://docs.ceph.com/en/reef/rbd/ https://docs.ceph.com/en/reef/cephfs/

The latter (as far as I am aware) will always require an interface on the cephfs network on any instance that is to access it.

millingw commented 2 months ago

Confirmed, with the cephfs network correctly attached to the VM, the ceph share can be mounted successfully.