opencontainers / umoci

umoci modifies Open Container images
https://umo.ci
Apache License 2.0
725 stars 97 forks source link

hookable "storage drivers" #325

Open cyphar opened 4 years ago

cyphar commented 4 years ago

Right now, umoci only supports plain-Jane filesystem extraction. This does make umoci much simpler and more versatile (it should work on any POSIX-compliant filesystem). However, that does have some efficiency costs -- namely when it comes to extraction and diff generation. go-mtree is quite a serviceable system for generating diffs, but it pales in comparison to overlay filesystems. And extracting layers is needless work if there is already a copy of that layer already extracted somewhere.

However the solution most projects have come up with is to bake in support for a set of filesystems and then users are forced to use that project's layer store (disallowing layer sharing between different implementations) and implementation (disallowing users to have custom implementations of layering). This also has a significant maintenance overhead for the project itself. I propose that we have a hook-based scheme which allows users to implement their own layer store caching and overlayfs layering implementations -- but we would provide some example hooks (which we might eventually support -- since they would be external to umoci we could update them without breaking users' setups) so users can get something working out of the box.

The CLI UX of the hooks would be scripts that are executed with a given command-line format (probably configured in ~/.config/umoci/hooks.toml or something). The Go API would be function callbacks, to allow for library users of umoci to avoid shelling out to other programs (as well as have more complicated hooks that use their own Go modules).

I would currently envision a few hooks (all of these would operate on the bundle directory with a rootfs specified by umoci and provided to the hook):

The hook names need to be such that when we start implementing OCIv2 (#256), we can add new hooks where it makes sense but keep the old ones where the semantics should be the same (while diff generation is not needed for my current vision of OCIv2, it would be useful to know which subset of the filesystem was changed so you don't need to rescan the whole thing).

@tych0 what do you think about this idea?

tych0 commented 4 years ago

I would currently envision a few hooks (all of these would operate on a rootfs directory specified by umoci and provided to the hook):

I wonder if you really want it to be on the parent of rootfs, since that contains all the umoci metadata (notably, the mtree file, which is expensive to compute). We snapshot at the directory above the rootfs for that reason.

But overall, it sounds good. What you have above would allow us to easily implement a solution for https://github.com/anuvu/stacker/issues/85, at least.

tych0 commented 4 years ago

unpack.ociv1.layer.try-skip which would ask the hook provider whether there is already an existing copy of a given layer unpacked somewhere in a usable state.

The other thing about this is: it's not really about the layer necessarily, but about the layer and everything below it. For example, since we're doing btrfs, mutations on the filesystem are cumulative. So we have some special way of computing an aggregate hash of all the layers that have been added to a particular rootfs, and we use that as the name. So we may need to pass the entire manifest to the hook.

tych0 commented 4 years ago

NOTE: Since the delta-list hook could fail, we might need to generate a new go-mtree list anyway, but this is only needed when unpacking or using --refresh-bundle. Alternatively we could make this fallback feature optional, so that users could make sure that they don't have to pay the cost of go-mtree generation if they're fine with a delta-list hook failure meaning they can't repack images.

I'd say if the delta-list hook fails, you just abort and make people fix their bugs :)

cyphar commented 4 years ago

@tych0

I wonder if you really want it to be on the parent of rootfs, since that contains all the umoci metadata (notably, the mtree file, which is expensive to compute). We snapshot at the directory above the rootfs for that reason.

Yeah, the bundle directory probably makes more sense (even from a "what should we do with respect to the spirit of the runtime-spec" perspective). There might even be room for a hook which allows you to modify the generated config.json of the container.

The other thing about this is: it's not really about the layer necessarily, but about the layer and everything below it. For example, since we're doing btrfs, mutations on the filesystem are cumulative.

Riiiight, because with btrfs the overlay stuff is actually implemented as snapshots rather than individual layer directories like you'd implement it with multi-lowerdir overlayfs. Yeah we'd need to support both use-cases but I think that just boils down to providing enough information to the hook for it decide which information it cares about.

I'd say if the delta-list hook fails, you just abort and make people fix their bugs :)

Probably not a bad idea. :D

tych0 commented 4 years ago

So one problem I see with this: we'd really like to do the decompression in parallel (in general we find that decompressing gzip is slower than our disks can write), but with these hooks, everything is all still writing content to $bundle_path/rootfs.

Can you imagine a way to enable this?

tych0 commented 4 years ago

I guess something like: func SupportsParallelDecompression() bool and func ExtractTo(digest ispec.Digest) maybe?

tych0 commented 4 years ago

Then I would just hook post-extract to do the actual rootfs setup.