threefoldtech / zos

Autonomous operating system
https://threefold.io/host/
Apache License 2.0
84 stars 14 forks source link

support bcachefs #2074

Open xmonader opened 1 year ago

xmonader commented 1 year ago

https://bcachefs.org/

muhamadazmy commented 12 months ago

There are a lot of thought needed to put into this for the following reason:

Other issues/concerns might show up while actually working on this. But in all cases that is a lot of changes, that gets more complex to be backward compatible as shown before (support running old and new style storage) and deciding if we gonna need to support both will change the amount of compatibility code we will have to drag.

This combined with mycelium work, i really start to think we need to have a new zos version 4 that runs separately from the current version 3

muhamadazmy commented 11 months ago

On another hand there is also a bcache (not bcachefs) that is similar to bcachefs but on the device level (not filesystem level) which means we can still use the devices with btrfs which can minimize the changes required to support bcaching

xmonader commented 11 months ago

Shouldn't be blocked anymore right?

xmonader commented 11 months ago

bcachefs is also now merged on trunk

muhamadazmy commented 11 months ago

My takes over using bcachefs or bcache in zos, and what i think might be a better solution. To make it clear, i will try to explain the different between both and why i think neither is good for our use case.

bcachefs

Why I think bcachefs is the right choice, is the following:

bcache

To solve the above problem i would think that bcache will be a better option. Since we can create a bcache device witch using the ssd as a cache device, then we will can create paritions (for each vdisk) that then can be attached directly to the VM, which will make the performance way better not just because we eliminate the underlying layers, but also because of that we have ssd device used as a cache.

The problem with bcache imho is that we will need to create partitions. Which will cause fragmentation of the disk and loss of space (imagine deleting a disk then u have an unused space in the middle of the disk that u can use because it's smaller than the newly requested disk)

This can be improved by using something like LVM on top, but LVM is a very old technology and i read that btrfs has big issues with it.

My proposed solution

Instead of using any of the above technologies we can build our own virtual disk, with nbd. The implementation will use the SSD for the cache (hot blocks), and evict the least accessed blocks to HDD. The service will still use files, but with few tricks this can be optimized for speed.

We can do really nice things with this then, for example, we can write the back end to multiple HDDs, we can then do mirror, strip and even do erasure coding to make sure loss of one of the HDDs does not cause loss of the data.

We later can even use (instead of HDD) use remote, distributed ZDBs on near by machines. This solves many issues:

muhamadazmy commented 5 months ago

I did some experiments with bcachefs and I was thinking what is the best way to actually use this with ZOS given the following:

xmonader commented 5 months ago

due to blockers, we won't address in 3.14