Closed xorander00 closed 1 year ago
Oh, and volume mounting is another thing. Since jail.conf supports mount.fstab, it should be simple enough to mount NFSv4 paths into the jail, along with jailed ZFS datasets.
- I took a quick look at the patches to ifconfig and route, glad to see they're finally going to get support for jails. I was using jexec for that purpose and wasn't a big fan of having to do that (plus it allows for thinner jails since the host executable can be used). I did a quick search for netgraph but didn't see anything, are you using epair or some other alterative? If not using netgraph, is there any interest in using it?
For managed network with VNET jails I am currently using epair instead of netgraph. Implementing netgraph support instead of vnet is not hard and I am certainly interested, but just don't have the time to implement it, similar happens to ipfw
(in fact ipfw
seems to be a better choice than pf
for a lot of the things I'm interested to do)
- How are OCI manifests (both index and image) being handled in terms of FreeBSD-specific configuration? I assume the index just lists FreeBSD, which shouldn't be an issue (I don't think). Are runtime-specific settings being stored in the free-form image manifest?
I'm currently using my own image config format, which allows xc
to do quite a few cool things, for example to determine the environment variables needed for a container even before a container started, and volume hints
such that the image can tell the hint the user the best way to create volumes (such as ZFS properties)
In terms of runtime specification, there isn't an equivalent one to the OCI runtime spec, the daemon sort of just take the image config and create the containers with jail(2)
, in fact, xc
does not use jail.conf
at all.
I'll take a look at the source a bit later, and if I can contribute anything, then I'd be happy to do so (though my time is a bit short right now so it'll take a while).
That'd be super awesome! I am currently refactoring / implementing many things so they can change a lot, but the core architecture should stay about the same.
Oh, and volume mounting is another thing. Since jail.conf supports mount.fstab, it should be simple enough to mount NFSv4 paths into the jail, along with jailed ZFS datasets.
Currently xc
do not use jail.conf
at all. mounting is done ad-hoc by the daemon and is implemented in an interesting way...
The context is, xc
is actually intended to build to be multi-user friendly, in fact if you change the ownership of the main socket you can use xc
as an unprivileged user. The challenge is how to make it safe to use in a multi-user environment.
Currently, in terms of copying files into the container, xcd
requires the client process to first open the file as an fd and pass to the daemon, such that if the user can't open the file in the first place, there's not way to exploit and steal the content of the file by creating a container and copy the file into it.
Mounting is however done quite differently, by default, we support mounting by path, but the daemon will check the client credential and determine if the user with such uid
or gid
can actually rwx
the directory they intended to mount, this comes with the issue they the user with same uid
/gid
can come from another Jail. The right way to implement it is to have the client send the dirfd, along with the path to the daemon so it can verify that dir at the path is indeed the same inode.
For these reasons, directly mounting NFS to a container by the user is current unsupported as to do it safely it requires us to implement managed volume first (similar to docker volume), in that case the access control checks can un-tie from the OS primitives but solely relies on some ACL / RBAC rules stored somewhere. (It's not like xc
had great security right now as the ACL/RBAC part is still unimplemented along with rctl, but I want xc
to be fairly usable / matching the expectations first)
I personally have a lot of interest in managed volume, as I am myself using xc
to build FreeBSD (see this repo), but it is currently not really the top priority yet.
- I took a quick look at the patches to ifconfig and route, glad to see they're finally going to get support for jails. I was using jexec for that purpose and wasn't a big fan of having to do that (plus it allows for thinner jails since the host executable can be used). I did a quick search for netgraph but didn't see anything, are you using epair or some other alterative? If not using netgraph, is there any interest in using it?
For managed network with VNET jails I am currently using epair instead of netgraph. Implementing netgraph support instead of vnet is not hard and I am certainly interested, but just don't have the time to implement it, similar happens to
ipfw
(in factipfw
seems to be a better choice thanpf
for a lot of the things I'm interested to do)
I haven't used ipfw in 15+ years now, so that's interesting. I'll have to take a look at it again compared to pf, which has been my default choice for a long time now.
- How are OCI manifests (both index and image) being handled in terms of FreeBSD-specific configuration? I assume the index just lists FreeBSD, which shouldn't be an issue (I don't think). Are runtime-specific settings being stored in the free-form image manifest?
I'm currently using my own image config format, which allows
xc
to do quite a few cool things, for example to determine the environment variables needed for a container even before a container started, andvolume hints
such that the image can tell the hint the user the best way to create volumes (such as ZFS properties)
Ah yeah, that's the approach I'm taking too. I'd like to directly support OCI, but the design seems fairly Linux-centric. I was just going to add the extra configuration as annotations under a FreeBSD-specific namespace (e.g. org.freebsd.oci). I also looked at at possibly using buildah to build and publish images and then using one of the existing crates for fetching, verifying, and loading the image. Ideally I want to take advantage of ZFS incremental snapshots such that the process would practically look like this for my usage:
After step 8, the ZFS dataset can then be cloned (or send-recv if preferred instead), mounted, and used as the root for the jail container. The OCI specification expects the usage of tarballs though (and I think also expects white-out files). ZFS is just so much nicer here, IMO. I think I did see an initiative to revise/extend the OCI spec to natively utilize ZFS instead of assuming tarballed file systems. The only downside is that ZFS doesn't seem to have a tested and stable user API. There are a couple of community options, but the alternative would be to generate native bindings, which isn't too terrible.
In terms of runtime specification, there isn't an equivalent one to the OCI runtime spec, the daemon sort of just take the image config and create the containers with
jail(2)
, in fact,xc
does not usejail.conf
at all.
Yup, my approach is almost the same. My preference is to utilize native libs and calls. I currently use Nomad as my orchestrator, and so I'm writing it as a task driver for it.
I'll take a look at the source a bit later, and if I can contribute anything, then I'd be happy to do so (though my time is a bit short right now so it'll take a while).
That'd be super awesome! I am currently refactoring / implementing many things so they can change a lot, but the core architecture should stay about the same.
Sounds good!
For managed network with VNET jails I am currently using epair instead of netgraph. Implementing netgraph support instead of vnet is not hard and I am certainly interested, but just don't have the time to implement it, similar happens to
ipfw
(in factipfw
seems to be a better choice thanpf
for a lot of the things I'm interested to do)I haven't used ipfw in 15+ years now, so that's interesting. I'll have to take a look at it again compared to pf, which has been my default choice for a long time now.
pf
is great for many reason, but ipfw
allow matching jid
and self
makes it very interesting, additionally, it supports npt66
Ah yeah, that's the approach I'm taking too. I'd like to directly support OCI, but the design seems fairly Linux-centric. I was just going to add the extra configuration as annotations under a FreeBSD-specific namespace (e.g. org.freebsd.oci). I also looked at at possibly using buildah to build and publish images and then using one of the existing crates for fetching, verifying, and loading the image.
That says, xc
do support OCI config directly too, it just do a conversion to the native image config format. I am considering the necessity to do the reverse (such that we can push a "normal" OCI config up), but if there ain't any other OCI-compatible FreeBSD container implementation anyway, I'm not sure if it's worth it.
Ideally I want to take advantage of ZFS incremental snapshots such that the process would practically look like this for my usage:
The procedure you described is almost identical as the current xc
one (except that only the final "product" is preserved as the amount of dataset in zfs list
cause me eye sore), the things I am going to switch to is to tag (zfs snapshot) each intermediate layer after they extracted, and only zfs clone
when branches are encountered.
ZFS send/recv can be less than ideal in the use case of container tho because of how it works, even if the result of the ZFS send/recv
at objects level are identical, because ZFS send/recv is in block level you may get different representation, and there are no ways to check if they are the same unless actually receiving the stream, which makes it not an ideal choice for layers.
If you are interested, you can check the ocitar
crate in this project, which acts as a proxy for bsdtar
that injects / process whiteout files in a stream.
In terms of task driver my current plan is to implement a CRI layer for Kubernetes, since xc
is capable of doing most of the things CRI required already. Ofc a Nomad driver should be way easier as in that case porting kubelet
is not required.
Hi, just ran into this project. Looks good! I'm actually in the process of implementing something similar, though not nearly as far along as your project is.
I'm going to look at the code next week sometime, but I figured I'd start off with some questions/comments in the meantime.
I'll take a look at the source a bit later, and if I can contribute anything, then I'd be happy to do so (though my time is a bit short right now so it'll take a while).