wetopi / docker-volume-rbd

Docker Engine managed plugin to manage RBD volumes.
MIT License
69 stars 18 forks source link

btrfs #11

Closed dpkirchner closed 5 years ago

dpkirchner commented 5 years ago

I think it'd be useful to be able to create btrfs volumes inside the plugin's container. You can mount the same btrfs-formatted image more than once on the same server. ext4 and xfs won't allow that -- they refuse to mount the same filesystem multiple times if they're using two different device nodes.

This is probably just a matter of adding btrfs to the docker image, but I'm not sure if that's enough for plugins.

box-daxter commented 5 years ago

Hi dpkirchner,

We tried it before, It should work, for the driver should be the same. We are using ext4 becauste it have the MMP(multi mount protection). I was unable to ensure if brtfs is 100% capable of be multimounted. Are you sure that brtfs could be multi mounted and writed at same time?

If you are 100% sure, we could test it. Could be an amazing feature for example to replace kind of NFS on webservers. It would able to share a documentRoot between several webservers on docker.

I can find any information on the https://btrfs.wiki.kernel.org/index.php/UseCases where it says that multi mount is allowed.

Regards

dpkirchner commented 5 years ago

While I can't say that btrfs 100% guarantees that it's OK to mount it more than once (I don't know the filesystem internals), I can at least say that it sort of works in my testing, whereas ext4 does not.

btrfs works when you mount the FS more than once on the same server. Unfortunately, when I tested mounting the same btrfs FS on two servers, I found that I couldn't see the writes on both. I should have tested this before opening this issue. I'm including these notes for completeness, but this issue can be closed.

I'm showing the commands that the plugin calls for demonstration purposes:

btrfs (both on the same server):

dpk@host:/$ sudo rbd --cluster ceph --pool my_pool --name client.host create -s 5000M foobar
dpk@host:/$ sudo rbd --cluster ceph --pool my_pool --name client.host map foobar
/dev/rbd0
dpk@host:/$ sudo rbd --cluster ceph --pool my_pool --name client.host map foobar
/dev/rbd1
dpk@host:/$ sudo mkfs.btrfs /dev/rbd0
btrfs-progs v4.7.3
See http://btrfs.wiki.kernel.org for more information.

Detected a SSD, turning off metadata duplication.  Mkfs with -m dup if you want to force metadata duplication.
Performing full device TRIM (4.88GiB) ...
Label:              (null)
UUID:
Node size:          16384
Sector size:        4096
Filesystem size:    4.88GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       yes
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     4.88GiB  /dev/rbd0

dpk@host:/$ sudo mount /dev/rbd0 /mnt1
dpk@host:/$ sudo mount /dev/rbd1 /mnt2
dpk@host:/$ echo foo | sudo tee /mnt1/written.on.1
foo
dpk@host:/$ echo foo | sudo tee /mnt2/written.on.2
foo
dpk@host:/$ ls -ld /mnt?/written.on*
-rw-r--r-- 1 root root 4 Jul 31 14:10 /mnt1/written.on.1
-rw-r--r-- 1 root root 4 Jul 31 14:10 /mnt1/written.on.2
-rw-r--r-- 1 root root 4 Jul 31 14:10 /mnt2/written.on.1
-rw-r--r-- 1 root root 4 Jul 31 14:10 /mnt2/written.on.2
dpk@host:/$ sudo rm /mnt?/written.on.*
rm: cannot remove '/mnt2/written.on.1': No such file or directory
rm: cannot remove '/mnt2/written.on.2': No such file or directory

ext4 (both on the same server):

dpk@host:/$ sudo rbd --cluster ceph --pool my_pool --name client.host create -s 5000M foobar
dpk@host:/$ sudo rbd --cluster ceph --pool my_pool --name client.host map foobar
/dev/rbd0
dpk@host:/$ sudo rbd --cluster ceph --pool my_pool --name client.host map foobar
/dev/rbd1
dpk@host:/$ sudo mkfs.ext4 -O mmp /dev/rbd0
mke2fs 1.43.4 (31-Jan-2017)
Discarding device blocks: done
Creating filesystem with 1280000 4k blocks and 320000 inodes
Filesystem UUID: 3ec326d2-ec43-414d-881f-9f41201f3ddc
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Multiple mount protection is enabled with update interval 5 seconds.
Writing superblocks and filesystem accounting information: done

dpk@host:/$ sudo mount /dev/rbd0 /mnt1
dpk@host:/$ sudo mount /dev/rbd1 /mnt2
mount: wrong fs type, bad option, bad superblock on /dev/rbd1,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
dpk@host:/$ echo $?
32
dpk@host:/$ sudo dmesg | tail
[56023.990483] BTRFS info (device rbd0): creating UUID tree
[56197.072841] libceph: mon0 216.127.36.38:6789 session established
[56197.079439] libceph: client706942 fsid 48a1e0d5-bcbe-4d4a-bf33-749dd54f726e
[56197.120930] rbd: rbd0: capacity 5242880000 features 0x5
[56198.512363] rbd: rbd1: capacity 5242880000 features 0x5
[56225.533444] EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
[56228.647224] EXT4-fs warning (device rbd1): ext4_multi_mount_protect:323: MMP interval 42 higher than expected, please wait.

[56250.611396] EXT4-fs warning (device rbd1): ext4_multi_mount_protect:336: Device is already active on another node.
[56250.614031] EXT4-fs warning (device rbd1): ext4_multi_mount_protect:336: MMP failure info: last update time: 1564582420, last update node: host, last update device: rbd0
dpk@host:/$ sudo tune2fs -l /dev/rbd0 | grep -i mmp
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit mmp flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
MMP block number:         8664
MMP update interval:      5

The result is the same when I mount the filesystem on more than one server, and when I wait for the MMP interval to lapse (although I'm not sure if that's relevant).

Linux host 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u4 (2019-07-19) x86_64 GNU/Linux
dpk@host:/$ dpkg -l | grep ext4
ii  e2fslibs:amd64                   1.43.4-2                         amd64        ext2/ext3/ext4 file system libraries
ii  e2fsprogs                        1.43.4-2                         amd64        ext2/ext3/ext4 file system utilities
box-daxter commented 5 years ago

The driver is formating with MMP feature, to ensure that same filesystem will be not mounted twice in case of map the RBD image in more than one host. But since the beggining, we are looking for some solution that allow us mount RBD twice to share webservers document roots.

New ideas will be wellcome!!