Open xmonader opened 1 year ago
There are a lot of thought needed to put into this for the following reason:
Other issues/concerns might show up while actually working on this. But in all cases that is a lot of changes, that gets more complex to be backward compatible as shown before (support running old and new style storage) and deciding if we gonna need to support both will change the amount of compatibility code we will have to drag.
This combined with mycelium work, i really start to think we need to have a new zos version 4 that runs separately from the current version 3
On another hand there is also a bcache
(not bcachefs) that is similar to bcachefs but on the device level (not filesystem level) which means we can still use the devices with btrfs which can minimize the changes required to support bcaching
Shouldn't be blocked anymore right?
bcachefs is also now merged on trunk
My takes over using bcachefs
or bcache
in zos, and what i think might be a better solution. To make it clear, i will try to explain the different between both and why i think neither is good for our use case.
Why I think bcachefs is the right choice, is the following:
files
as virtual disks that are attached to virtual machines. This itself is hurting the performance of the disks inside the vm very badly. sine an IO request has to go through many layers (fs opertaion (VM) -> IO to device (VM) -> this then goes over virtio to underlying file, which is then translate to fs operation (host) -> then io to block device (host)).To solve the above problem i would think that bcache will be a better option. Since we can create a bcache device witch using the ssd as a cache device, then we will can create paritions (for each vdisk) that then can be attached directly to the VM, which will make the performance way better not just because we eliminate the underlying layers, but also because of that we have ssd device used as a cache.
The problem with bcache imho is that we will need to create partitions. Which will cause fragmentation of the disk and loss of space (imagine deleting a disk then u have an unused space in the middle of the disk that u can use because it's smaller than the newly requested disk)
This can be improved by using something like LVM on top, but LVM is a very old technology and i read that btrfs has big issues with it.
Instead of using any of the above technologies we can build our own virtual disk, with nbd
. The implementation will use the SSD for the cache (hot blocks), and evict the least accessed blocks to HDD. The service will still use files, but with few tricks this can be optimized for speed.
We can do really nice things with this then, for example, we can write the back end to multiple HDDs, we can then do mirror
, strip
and even do erasure coding
to make sure loss of one of the HDDs does not cause loss of the data.
We later can even use (instead of HDD) use remote, distributed ZDBs on near by machines. This solves many issues:
I did some experiments with bcachefs and I was thinking what is the best way to actually use this with ZOS given the following:
due to blockers, we won't address in 3.14
https://bcachefs.org/