polydawn / repeatr

Repeatr: Reproducible, hermetic Computation. Provision containers from Content-Addressable snapshots; run using familiar containers (e.g. runc); store outputs in Content-Addressable form too! JSON API; connect your own pipelines! (Or, use github.com/polydawn/stellar for pipelines!)
https://repeatr.io
Apache License 2.0
68 stars 5 forks source link

Mac xhyve executor #111

Open willscott opened 4 years ago

willscott commented 4 years ago

https://github.com/zchee/libhyperkit is probably the right set of bindings to make use of that will work for mac hosts.

There needs to be a refactor in how the rio mount placer interacts with the executor - on mac they need to happen together, rather than first constructing the overlay FS and then setting the constructed FS as the environment in a docker / chroot container. On a mac host, an appropriate overlay FS can't easily be created on the host, but can exist within the executor environment.

There will also need to be a choice of what kernel should be used in the guest environment.

warpfork commented 4 years ago

Looks like a lot of cgo -- which is noteworthy because it means compilation needs significantly more dependencies -- but it's possible that's typical of xhyve bindings. I'm not super familiar with any other options.

I think the refactor to how the executor uses rio mounts and filesystem stitching is... pretty approachable. The executor interface covers all phases of the filesystem prep and later teardown, so the new executor can easily choose not to use any of the existing mixins... so there's no blockers there. What's the part you can't reuse right now? ...oh, uff, it's probably stitch.Assembler.Run isn't it. Yeah, if you're thinking that should be broken apart into smaller pieces, one to handle parallel unpacks and caching, and a separate thing that handles mounts, I'd agree.

The trickiest thing about this might be... if you're hoping to do overlayfs mounts from inside the xhyve area... you'll need to get another binary in there to do that work, presumably. And is that going to then be a linux binary? I don't know exactly how this all works. That sounds possible, but increasingly Fun.

You're right about the kernel. I don't have many opinions about this. It seems like something that should probably be configured at the same breath as the executor (e.g., I'd hesitate to put them in the formula, as we hope that more than one kind of executor can make the same formula execute -- containment is supposed to be a commodity!). And it might be worth noting that the kernel is a parameter that might be shared by several executors (e.g. a qemu executor might require the same things)? Beyond that I don't have an concrete thoughts at the moment.

willscott commented 4 years ago

I'll split out kernel choice (and architecture) into a separate issue, since that's a separate problem.

The other option for xhyve is the https://github.com/moby/hyperkit setup docker uses, which calls xhyve in a sub process, so uses an implicit dependency on the binary rather than pulling it in as a compile-time library.

Ideally, there wouldn't need to be an overlayfs layer in the virtualized world. One of the virtualIO drivers for the exposed block device in the VM is 9pfs, which appears to support support exposing a block device created with an overlay of multiple host folders directly. ref: https://github.com/machine-drivers/docker-machine-driver-xhyve/pull/194 (this uses command line args to the xhyve binary, one of the arguments for using the library in the first post is that getting the FS configured should be easier with that interface)

warpfork commented 4 years ago

I'm afraid there might be one painful nit that still needs taking care of, though: creating parent directories for mounts.

Consider the formula:

{
  "inputs": {
    "/a/deep/mount": "tar:qwer"
  },
  "action": { /*noop*/ },
  "outputs": {
    "/": {"packtype": "tar"}
  }
}

... what's going to be the permissions on the "./a/deep/" path in the output filesystem? And what will the mtime property be on that path, if we ask for it to be preserved when packing the output?

warpfork commented 4 years ago

I'm neutral on the library choice as well. IMO, if something can be done with pure go and the syscalls packages, that's preferable because it's the most maintainable. If there's a choice between some cgo APIs (especially large ones, as the larger it is, the more likely something is to suffer a 'refactor' that adds maintenance costs to consumers) versus some CLI to wrap... I'd probably try to pick whichever one is less ambiguous and changes least frequently. I'm not sure how stable the xhyve CLIs are, but it's possible they're on par with the cgo APIs.

willscott commented 4 years ago

as a prefix: this is indeed quite a rabbit hole 🐇

hyperkit / xhyve had two potential PCI drivers that are options for exposing data from the host into the guest environment. The first is virtio_block, a VirtIO block device, which by default proxies a block device (identified by major/minor number) from the host into the guest. As an extension, the virtio_9p uses the plan9 library linked on the host to expose one or more host folders as plan9 shares that can be mounted in the guest. There's also been some work on exposing 'raw devices' where a fd identified by a path could provide the backing block device.

The other option is to use the virtio_net Network device, and provide a network service for exposing the file system. Signs of current usage of virtio_net seem to point to NFS as the preferred protocol for exposing a host file server that can be mounted by the guest.

The current set of things I'm considering in making a decision on a preferred approach are: