Open benaryorg opened 4 months ago
For completeness sake, here are some commands to get a new CephFS volume and subvolume stuff up and running and how the final mount command might look like (I'm fumbling that out of my history, not guaranteed to be 100% accurate):
ceph fs volume create volume-name
ceph fs subvolumegroup create volume-name subvolume-group-name
ceph fs subvolume create volume-name subvolume-name --group_name subvolume-group-name
# this will now spit out a path including the UUID of the subvolume:
ceph fs subvolume getpath volume-name subvolume-name --group_name subvolume-group-name
# then authorize a new client (syntax changes slightly in upcoming version)
ceph fs authorize volume-name client.client-name /volumes/subvolume-group-name/subvolume-name/e7c5cd0c-10fa-42e2-9d48-902544f13d07 rw
# which can be mounted like (fsid can be omitted if it is in ceph.conf, key will be read from keyring in /etc/ceph too):
mount -t ceph client-name@.volume-name=/volumes/subvolume-group-name/subvolume-name/e7c5cd0c-10fa-42e2-9d48-902544f13d07 /mnt
Just a question: what is the use-case blocked? I actively use CephFS storage pool with my Incus + Microceph deployment (as well as with LXD + Microceph in the past) and I do not see any issues. All such volumes mounted to the instances.
Just a question: what is the use-case blocked? I actively use CephFS storage pool with my Incus + Microceph deployment (as well as with LXD + Microceph in the past) and I do not see any issues. All such volumes mounted to the instances.
How does your storage configuration look like?
I've tried several permutations that looked like they could work, but considering that I also managed to drop down to incus admin sql
to be able to delete the storage pool (which got stuck in pending forever) once I did not try everything.
How does your storage configuration look like? I've tried several permutations that looked like they could work, but considering that I also managed to drop down to
incus admin sql
to be able to delete the storage pool (which got stuck in pending forever) once I did not try everything.
Raspberry Pi 4B
nodes under Ubuntu 24.04 boot from sdcard. 1GiB USB SSD attached to each node.microceph
snap package installed on every node and microceph cluster
configured. Each node export one osd
incus 6.3
installed on every node and incus cluster build (actually cluster was migrated to incus from LXD)ceph-common
package is installed as incus does not work directly with microceph (as LXD
does). Please see discussions: https://discuss.linuxcontainers.org/t/unable-to-migrate-lxd-5-21-with-microceph-to-incus-6-0/19714 and https://discuss.linuxcontainers.org/t/incus-vm-on-raspberry-pi4/19357/15 for additional details. NAME DRIVER DESCRIPTION USED BY STATE
remote ceph 62 CREATED
shared_vols cephfs 11 CREATED
test_shared_vols cephfs 1 CREATED
test_shared_vols
configuration
config:
cephfs.cluster_name: ceph
cephfs.path: lxd_test_shared
cephfs.user.name: admin
description: ""
name: test_shared_vols
driver: cephfs
used_by:
$sudo ceph fs ls
name: lxd_test_shared, metadata pool: lxd_test_shared_pool_meta, data pools: [lxd_test_shared_pool_data ]
for i in {1..7}; do incus storage create test_shared_vols cephfs source=lxd_test_shared --target cl-0$i; done \
&& incus storage create test_shared_vols cephfs
incus storage volume create test_shared_vols test_vol1 size=256MiB --project test
for i in {1..7}; do inst=test-ct-0$i; \
echo "Launching instance: $inst"; incus launch images:alpine/edge $inst --project test; \
echo "Attaching 'test_vol1' to the instance"; incus storage volume attach test_shared_vols test_vol1 $inst data "/data" --project test; \
echo "Listing content of '/data' directory:"; incus exec $inst --project test -- ls -l /data; \
done
incus exec test-ct-04 --project test -- sh -c 'echo -e "This is a file\n placed tothe shared volume.\n It is acceccible from any instance where this volume is attached.\n" > /data/test.txt'
for i in {1..7}; do inst=test-ct-0$i; echo "Listing content of '/data' directory in the $inst instance"; incus exec $inst --project test -- ls -l /data; done
for i in {1..7}; do inst=test-ct-0$i; echo "--- Printing content of '/data/test.txt' file in the $inst instance ---"; incus exec $inst --project test -- cat /data/test.txt; done
So far it does not look like you are using the ceph fs volume
feature (at least not with subvolumes), otherwise your CephFS paths would include a UUID somewhere. Besides, using the admin credentials would side-step any mounting issues that I'm seeing because you would be able to mount the root of the CephFS even if trying to mount a CephFS subvolume. If you create a subvolume as per my first reply in the post, then you will have credentials that do not have access to the root of the CephFS, making you unable to use the storage configuration you provided (since that one does not contain any paths, and therefore would fail to mount for lack of permissions) as far as I can tell.
So far it does not look like you are using the
ceph fs volume
feature
Yes, you are correct. This why I asked about your use-case.
Yes, you are correct. This why I asked about your use-case.
Ah, I see. The primary advantage to me personally is that I don't have to manually lay out a directory structure (i.e. I do not have to actually mount the CephFS with elevated privileges such as client.admin to administrate it), the quota support is baked in, and authorization of individual clients for shares becomes programmatic over that specific API (i.e. less worrying about adding or removing caps outside the CephFS system).
If I were to automate Incus cluster deployment (or even just deployment for individual consumers of CephFS, and also want to handle Incus in the same way), I could instead use the Restful API module of the MGR for many operations in a way that is much less error prone than the API is for managing CephFS otherwise; I wouldn't need to create individual directory trees, and I would not have to enforce a certain convention for how the trees are laid out (since volumes have their very specific layout). Quota management also becomes less of a "have to write xattr of specific directory" and much more tightly attached to the subvolume. The combination of getpath and the way the auth management is handled also makes it a little harder to accidentally use the wrong path or something. This is mostly about automation and programmatically handling things, which is in line with what OpenStack Manila wants for its backend.
Especially when administrating a Ceph cluster on a team with several admins however the added constraints make it much easier to work as a team since there are no strict conventions to stick to oneself, because Ceph already enforces those.
Being able to create multiple volumes, each of which comes with its own pools and MDSs, also greatly improves how things work when you have to separate tenants for whatever reason. Given that it's often beneficial to run one big Ceph cluster instead of many small ones (due to the increase in failure domains) I can see how some of the customers I worked with would like to use that feature (granted, none of those customers were using Incus though), and with any newer clusters I would absolutely recommend using volumes even if just for the reason that you don't have to go back and reintroduce and clean up every part where things weren't properly separated later on (since inevitably every user of Ceph at some points needs some level of isolation for whatever reason, I've never not seen it happen).
In short; it makes me not trip over my own feet when adding a new isolated filesystem share by taking care of the credential-management, directory creation, and quotas, something which I'd surely manage to at least once mess up and like.… delete the client.ceph credentials or something (which wouldn't be possible with the ceph fs deauthorize
command as far as I can tell).
TL;DR: it's just more robust as soon as you need to have separate shares for different clients and makes managing the cluster easier if there is a strong separation of concerns.
Ah, I see.
I appreciated your detailed explanation.
Required information
Issue description
CephFS has changed its mount string in Quincy, the version that has recently reached its estimated EoL date (current being Reef, Squid is upcoming AFAIK). This means that any still active release (talking about upstream, not distros) has a mount string that is different from the one Incus is using right now.
This leads to users having a really hard time trying to mount CephFS created via the newer CephFS Volumes/Subvolumes mechanic (at least I haven't gotten it working yet).
As described in the discussion boards the old syntax was:
and a lot of options via the
-o
parameter (or the appropriate field in the mount syscall). Notably Incus does not rely on the config file for this but manually scrapes the mon addresses out of the config file (which has its own issues because the used string matching is insufficient to catch an initial mon list which then refers to the mons by name and the mons being listed in their own sections with their addresses directly asmon_addr
, which means that whilemount.ceph
can just mount the volume, Incus fails during the parsing phase of the config file.The new syntax is:
So with the user, the (optional) fsid, and the cephfs name being encoded into the string there are a few less options, although they do still exist.
Steps to reproduce
With vaguely correct seeming parameters provided to Incus this will still lead to interesting issues like getting No Route to Host errors despite everything being reachable. Honestly, if you find options that manage to mount that, please tell me because I can't seem to find any.
Information to attach
Any relevant kernel output (dmesg)
```text [ +13.628392] libceph: mon0 (1)[2001:db8::1:0]:3300 socket closed (con state V1_BANNER) [ +0.271853] libceph: mon0 (1)[2001:db8::1:0]:3300 socket closed (con state V1_BANNER) [ +0.519922] libceph: mon0 (1)[2001:db8::1:0]:3300 socket closed (con state V1_BANNER) [ +0.520979] ceph: No mds server is up or the cluster is laggy ```Main daemon log (at /var/log/incus/incusd.log)
```text Jul 19 20:32:09 lxd2 incusd[10412]: time="2024-07-19T20:32:09Z" level=error msg="Failed mounting storage pool" err="Failed to mount \"[2001:41d0:700:2038::1:0]:3300,[2001:41d0:1004:1a22::1:1]:3300,[2001:41d0:602:2029::1:2]:3300:/\" on \"/var/lib/incus/storage-pools/cephfs\" using \"ceph\": invalid argument" pool=cephfs ```Container log (incus info NAME --show-log
)Container configuration (incus config show NAME --expanded
)Output of the client with --debugOutput of the daemon with --debug (alternatively output of(doesn't really log anything about the issue)incus monitor --pretty
while reproducing the issue)