threefoldtech / zos

Autonomous operating system
https://threefold.io/host/
Apache License 2.0
84 stars 14 forks source link

zos bcachefs assessment #2396

Open iwanbk opened 3 months ago

iwanbk commented 3 months ago

Assess how we can use bcachefs on zos

related issues:

Is your feature request related to a problem? Please describe

Why we need to move out from btrfs:

Why bcachefs:

scope:

Describe the solution you'd like

The assessment will be done in two phases

  1. backward compatibility check

We do this check because we need to know how btrfs is currently used in zos for these reasons:

For things that are compatible: good For non compatible things:

  1. plan/specs to use bcachefs on zos

cc @delandtj

iwanbk commented 3 months ago

backward compatibility check

This check involves the work on porting current btrfs code to bcachefs, it is WIP in #2375 Deep diving the code is expected to give more understanding, although function call is not always obvious because of zbus usage. (zbus is a good thing, we only need to be more throughout when tracing the call flow)

No support for subvolume limit limit/quota

what we really need:

how subvolume limit used: a. set limit on zos cache: no issue here, we will keep it on btrfs

b. when creating volume for a container

possible solutions:

c. on VolumeUpdate https://github.com/threefoldtech/zos/blob/0ea61706e1a501d4e774a9195c139e2995bdd1cb/pkg/primitives/volume/volume.go#L98 it is used by:

d. on pkg/flisthttps://github.com/threefoldtech/zos/blob/0ea61706e1a501d4e774a9195c139e2995bdd1cb/pkg/flist/flist.go#L472 it is used by qsfsd when ....

No support for FS_NOCOW_FL file attribute

what we really need:

possible solution:

No subvolume info command

what we really need:

possible solutions: we don't really need it. Subvolume disk usage only really needed when there is no limit on the subvolume. And the only occurence for this is when we create zdb cache. zdb cache disk usage is counted using it's own method.

current lsblk doesn't have bcachefs support

what we really need: Get disk label/fstype on startup

solution Maxus will upgrade it

iwanbk commented 3 months ago

Specification

The new bcachefs based storage must provide all the features provided by the btrfs based storage.

Backward compatibility

Because all disk of the old nodes already formatted with btrfs, we only support new nodes

bcachefs only for the workloads

Root filesystem still use btrfs with it's /var/run/cache

multidevice filesystem strategy

bcachefs supports a real pool, where multiple devices can be formatted into a single filesystem:

caching

writeback caching:

config

--foreground_target=ssd
--background_target=hdd
--promote_target=ssd

quota management

the language (Rust or Go)

Rust is the way to go, but the prototype can be build using Go

iwanbk commented 3 months ago

mkfs.bcachefs also has this option, worth to check

--usrquota              Enable user quotas
--grpquota              Enable group quotas
--prjquota              Enable project quotas
iwanbk commented 3 weeks ago

There was drama on LKML about bcachefs https://www.phoronix.com/news/Bcachefs-Fixes-Two-Choices.

Or "take your toy and go home" effectively alluding to taking it out of the mainline Linux kernel and go back to developing it out-of-tree.

The risk is that bcachefs could be out of mainline kernel. So, we observe and see for now.