snarkmaster / btrfs-ublk

R/W btrfs, seeding from special btrfs with 4EiB "virtual / remote" file, hosted on `ublk`.
7 stars 0 forks source link

Question about the state of the work #1

Closed aledbf closed 2 months ago

aledbf commented 1 year ago

👋 Hi, this work is really interesting. Is there a plan to continue working on it, or was it a proof of concept? Thanks!

aledbf commented 1 year ago

the readme says

I don't demonstrate this here since it's not helpful for my application.

Do you have references to something like that? Thanks!

snarkmaster commented 1 year ago

This was just a proof of concept. I remain interested in picking it up in the future to make it more real (in regards to reading from an immutable block device on the network), but have no immediate plans.

If you're looking to work on something like this for real, we could sync up to see if there's potential for collaboration.

demonstrate writing to a non-physical block device

... I think I need to know exactly what you're trying to do to be more helpful, but let me give you some general thoughts.

To me, the only obvious application is to do a "single writer" setup putting data on a network block device. Prior art would be things like NBD, iSCSI, etc. I wouldn't say I love any of them though. E.g. I tried btrfs-on-iSCSI on my LAN and performance was poor except on wired ethernet (poor == much worse than the underlying spinning disk).

There shouldn't be any "magic" about using ublk to handle writes to a remote (virtual) block device, insofar as it has hooks for writes, and you just have to follow Linux block device semantics to the letter. The main risks I see are

aledbf commented 1 year ago

@snarkmaster thank you for que quick reply. How can I contact you?

We are searching a network filesystem for containers exposing it as a local device in a cloud environment (AWS).

snarkmaster commented 1 year ago

network filesystem for cloud

I think you have to say a few more words about the use-case. Writable with a single writer? Immutable with multiple readers? Multi-reader, multi-writer?

aledbf commented 1 year ago

The idea is to have multiple micro VMs mapped to a different network block device in the node where they are running (meaning, single writer), and for the filesystem, an object store like S3 mapped to blocks from multiple files in the node (this would be a kind of cache). The reason for the S3 and local blocks is to support some lazy loading to remove the need to wait for the whole fs before starting the micro VM

Immutable with multiple readers?

This would be interesting if it is possible to use btrfs snapshots as the starting point of new µVMs.

snarkmaster commented 1 year ago

your use-case

I don't have any experience with Nydus, but it seems to be marketed towards your use-case. And is more of a "real" thing. Does it have any gaps relative to what you want?

start microVMs from btrfs snapshots

That's kind of the point I was pursuing with btrfs-ublk initially. You can definitely do that, although provisioning the prototype filesystem is O(# files) since the metadata is all-local in btrfs-ublk.

You can also use a remote block device for your single-writer use-case, but, as I said above, there are real caveats that need to be experimentally quantified here.

I vaguely remember that "block device on S3" gadgets already exist. Could you try this out on top of the existing btrfs-ublk demo (instead of a writeable loopback) and report your results here? It's probably a half-day to get this wired up and run some basic test (an half-day I sadly don't expect to have in the near future).

aledbf commented 1 year ago

I don't have any experience with Nydus, but it seems to be marketed towards your use-case.

Yes, the nydus image service works exactly like I want, but I cannot use it as the rootfs in firecracker due to the lack of virtio-fs support.

Could you try this out on top of the existing btrfs-ublk demo (instead of a writeable loopback) and report your results here? It's probably a half-day to get this wired up and run some basic test (an half-day I sadly don't expect to have in the near future).

I will try to do that