kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).

https://katacontainers.io/

Apache License 2.0

2.1k stars 374 forks source link

Being able to invoke qemu through libvirt #2227

Closed c3d closed 3 years ago

c3d commented 5 years ago

Which feature do you think can be improved?

On Fedora, qemu is normally never invoked directly, but rather through libvirt. This improves security, notably by allowing qemu to run as a non-root user.

How can it be improved?

Instead of starting qemu directly, the VM could be started through libvirt. This presumably requires creating an XML file to pass around the required configuration (not saying that this is easy…)

grahamwhaley commented 5 years ago

I think a contribution enabling libvirt as one of the supported hypervisor interfaces for kata would be most welcome. Historically (aka, a loooong time ago), fore-runners to kata avoided libvirt as the time/space/complexity overhead outweighed the benefits (things back then were probably more focussed on size and speed than security). As an option though, if having libvirt as an option added security benefits, I think that could be well accepted.

/cc @gnawux @egernst @sameo for any thoughts.

aagit commented 4 years ago

Hello everyone,

Note: I'm speaking only for myself in this message.

In my view it'd be preferable to focus on adding support to kata-runtime to support vfio device assignment to the guest with qemu running without privileges, or whatever else microvm hotplug with paravirtualized hotplug support for virtio-mmio yet to be developed, depending on whataever use case requiring vfio, hotplug and device assignment emerges and requires to be supported in the future.

Just dropping root with virtiofs and without PCI device assignment is a trivial change to kata-runtime: upstream qemu already supports it with the -runas option.

https://github.com/qemu/qemu/blob/c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f/os-posix.c#L187

If libvirtd can somehow handle unprivileged vfio device assignment, so can kata-runtime without libvirtd, it's just a matter of implementing it.

The ideal objective of avoiding code duplication by adding libvirt support to kata-runtime sounds like an oxymoron: sure you won't have to add the code that does unprivileged vfio support to kata-runtime anymore, but it'll backfire by creating instant duplication at the selinux-svirt/namespaces/cgroups/seccomp level, which are all things that are commonly already handled by CRI-O by all other containers.

Overall the most important objective that is worth investing on, is that kata-runtime and qemu are always upgraded atomically and they're effectively monolithic, so that there will be zero userland stable API that needs to be maintained with backwards compatibility (i.e. kata-runtime needs to speak a single version of the qemu command line at any given time) in between the enduser/customer deploying a container and the host kernel syscall ABI.

bonzini commented 4 years ago

it'll backfire by creating instant duplication at the selinux-svirt/namespaces/cgroups/seccomp level, which are all things that are commonly already handled by CRI-O by all other containers.

Are you thinking of libcontainer/runc rather than CRI-O? If you mean that there would be duplicated code between the two runtime classes then yeah, there would be some duplication between libcontainer and libvirt.

In my opinion the advantage of libvirt is that it can be considered a black box as far as Kata is concerned, especially with the new daemon-less operation. The fact that the cgroups code is duplicated between libcontainer and libvirt is irrelevant as far as Kata is concerned. If it's something that either libcontainer or libvirt folks want to unify it's great, but libvirt strictly gives Kata fewer things to worry about than go-vmm (and of course go-vmm is basically part of Kata while the libvirt-go and libvirt-xml-go bindings are maintained by someone else).

aagit commented 4 years ago

Are you thinking of libcontainer/runc rather than CRI-O?

It doesn't really matter where the duplication materializes, that's a minor implementation detail.

The point is that all software when it's containerized, uses the standard support to apply selinux/namespaces/cgroups/seccomp. qemu is not magic, it's just a plain normal process like every other and it needs to reuse the container runtime isolation like every other process.

The only exception is when qemu (or any other app for that matter) needs to apply a strictier isolation (like a strictier seccomp filter or an empty-chroot-like fs-namespace view) only applicable during its runtime (like after forking a child that has an higher surface of attack), but this isn't what we're talking about here: what libvirt applies is still a fixed coarse kind of isolation, nothing finegrined depending on the actual runtime that can only be applied at runtime, so there's no justification to make qemu a special case and to add a "special security wrapper" for commoditized isolation security features already done by the container runtime.

Upstream shouldn't focus on short term issues, we need to focus on what is more secure overall: if anything is missing then it should be applied to the standard container runtime, not kept in libvirt for the only benefit of qemu. If qemu can't live with the standard runtime security isolation and it needs to carry its own security wrapper, then imagine all other bare metal containers that don't even have the virt isolation on top of the bare metal isolation, how can all containers live without their own libvirt?

Furthermore I don't see the support for PR_SET_NO_NEW_PRIVS in libvirt, while there is in firejail/crun etc... I mean if something libvirt is already falling behind the standard and it's more likely stuff would need to be added to libvirt than moved from libvirt to crun.

bonzini commented 4 years ago

Upstream shouldn't focus on short term issues, we need to focus on what is more secure overall

Define "short term issues" and "more secure overall". Possibly with actual examples of privilege escalation that are not prevented by libvirt.

Furthermore I don't see the support for PR_SET_NO_NEW_PRIVS in libvirt, while there is in firejail/crun etc... I mean if something libvirt is already falling behind the standard and it's more likely stuff would need to be added to libvirt than moved from libvirt to crun.

The seccomp policy forbids completely the spawning of child processes, and...

With no_new_privs set to 1, execve(2) promises not to grant privileges to do anything
that could not have been done without the execve(2) call

... you can't get privileges from execve if you cannot execve.

aagit commented 4 years ago

Define "short term issues" and "more secure overall". Possibly with actual examples of privilege escalation that are not prevented by libvirt.

"short term issues": I meant being able to apply the current libvirt security isolation without having to port it to the container runtime first (i.e. svirt, namespaces, cgroup). It may an effect in the short term to be able achieve the standard libvirt security practice quicker perhaps (a not even sure about that: it may still take more time to add libvirt support to kata-runtime than to port the security isolation to the container runtime, but I just couldn't think of other possible benefits so I said "short term issues" as "short term concerns").

"more secure overall": I wasn't talking about qemu, I am talking about everything: podman, on-prem cloud, public clouds running k8s. I already justified it, quote: "if anything is missing then it should be applied to the standard container runtime, not kept in libvirt for the only benefit of qemu".

The seccomp policy forbids completely the spawning of child processes, and... ... you can't get privileges from execve if you cannot execve.

The seccomp policy can be opted-out in qemu: there is no justification for libvirt not to add PR_SET_NO_NEW_PRIVS at the very least if the seccomp policy isn't runtime enabled in qemu. It's less an issue for kata because we could apply it behind libvirt in the container runtime and we'd also apply the seccomp policy, but here I was purely making an example of stuff that needs to move from crun to libvirt, not the way around. To explain not only it would be possible to do the same security isolation libvirt does through the container runtime without having to write code, it would also be possible to do more.

And as said above, if code has to be written, the end result will be "more secure overall" (if ignoring "short term issues").

bonzini commented 4 years ago

The seccomp policy can be opted-out in qemu

When Libvirt invokes QEMU it knows that the seccomp policy will be there, because it uses -sandbox. Am I misunderstanding what you mean?

it would also be possible to do more.

What is "more"? Adding no_new_privs is not "more", it's security theater if QEMU is anyway disallowed from spawning child processes. (Besides, QEMU already knows how to set no_new_privs when it applies the seccomp filter. Libvirt doesn't ask because it's pointless).

aagit commented 4 years ago

When Libvirt invokes QEMU it knows that the seccomp policy will be there, because it uses -sandbox. Am I misunderstanding what you mean?

First of all this took a tangent offtopic with kata: we're discussing the worth of adding PR_SET_NO_NEW_PRIVS knowledge to libvirt.

Even if I'd be entirely wrong about this, it changes nothing with everything else that was discussed actually related to kata.

So first of all qemu can be built without CONFIG_SECCOMP:

ifdef CONFIG_SECCOMP

int parse_sandbox(void opaque, QemuOpts opts, Error **errp)

and even when it's built in, it can be disabled at runtime and it looks like libvirt allows to opt-out of seccomp:

{ "sandbox", "enable", QEMU_CAPS_SECCOMP_SANDBOX },
{ "sandbox", "elevateprivileges", QEMU_CAPS_SECCOMP_BLACKLIST },

[..]

if (cfg->seccompSandbox == 0) {
    if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_SECCOMP_SANDBOX))
        virCommandAddArgList(cmd, "-sandbox", "off", NULL);
    return 0;
}

What is "more"? Adding no_new_privs is not "more", it's security theater if QEMU is anyway disallowed from spawning child processes. (Besides, QEMU already knows how to set no_new_privs when it applies the seccomp filter. Libvirt doesn't ask because it's pointless).

fork and exec aren't disallowed unless secomp is enabled which isn't guaranteed as shown above.

So to be more clear, I should rephrase the words "there is no justification for libvirt not to add PR_SET_NO_NEW_PRIVS" with "there is no justification for libvirt not to add PR_SET_NO_NEW_PRIVS when it executes qemu with the sandbox off or when it executes any of the other dozen of hypervisors that may have an userland and may not have a sandbox".

The only reason qemu supports elevateprivileges at all (which is the only option that would set PR_SET_NO_NEW_PRIVS if equal to "children") is to allow bridge helpers or other scripts to run, but those would never need to run with libvirt, they're entirely mutually exclusive. When libvirt is on there is never a justification for qemu to ever run any suid at all.

In other words libvirt simply needs to add a PR_SET_NO_NEW_PRIVS, and if something breaks, it means they will have found a major bug, not that PR_SET_NO_NEW_PRIVS should be made conditional. It's making PR_SET_NO_NEW_PRIVS conditional to whatever lowevel qemu detail that is a security theater.

Qemu has a good reason not to issue PR_SET_NO_NEW_PRIVS unconditionally: so if it's run without libvirt it can still run an helper suid to setup the network. To solve that qemu should remain in control like sshd root server is, and it should setup the network in the parent while the child qemu does seccomp and the other security isolation unconditionally.

The fact seccomp is not unconditionally set by libvirt nor qemu and the fact PR_SET_NO_NEW_PRIVS is not unconditional in qemu either, and the fact libvirt even pretends to support a dozen of other hypervisors that may not support seccomp nor PR_SET_NO_NEW_PRIVS, makes me wonder if the lack of PR_SET_NO_NEW_PRIVS in libvirt should be fixed, at least if qemu is the hypervisor invoked.

bonzini commented 4 years ago

fork and exec aren't disallowed unless secomp is enabled which isn't guaranteed as shown above.

Right, if the user explicitly chooses an insecure setup with sandboxing disabled it's probably for a reason, why would Libvirt enable some sandboxing in that case? Chance are that it would break something.

Note that --security-opt no-new-privileges is opt-in for docker/podman, while the whole of sandboxing (seccomp) is opt-out for Libvirt. Showing how Libvirt will just do the right thing.

libvirt even pretends to support a dozen of other hypervisors that may not support seccomp nor PR_SET_NO_NEW_PRIVS,

I suggest that you tone down your messages and study Libvirt a bit more, because Libvirt won't even exec anything for most non-QEMU hypervisors so it doesn't have anything to sandbox. I also don't see how those other hypervisors are relevant to Kata.

To solve that qemu should remain in control

This is also irrelevant for either Libvirt or Kata as you say, but anyway patches are welcome on qemu-devel@nongnu.org if you believe that would be an improvement.

aagit commented 4 years ago

Right, if the user explicitly chooses an insecure setup with sandboxing disabled it's probably for a reason, why would Libvirt enable some sandboxing in that case? Chance are that it would break something.

I covered that in my previous post: "if something breaks, it means they will have found a major bug".

Note that --security-opt no-new-privileges is opt-in for docker/podman, while the whole of sandboxing (seccomp) is opt-out for Libvirt. Showing how Libvirt will just do the right thing.

So it's perfectly correct that --security-opt no-new-privileges is an opt-in for docker/podman. docker/podman have to run everything under the sun including /bin/su itself, there's no way it can unconditionally set PR_SET_NO_NEW_PRIVS.

That is not the case with libvirt which has been specifically designed to never require any privilege from qemu.

libvirt will never run qemu as root, as such it also should not allow qemu to gain privilege from any suid. No matter if the sandbox is enabled or not.

libvirt even pretends to support a dozen of other hypervisors that may not support seccomp nor PR_SET_NO_NEW_PRIVS,

I suggest that you tone down your messages and study Libvirt a bit more, because Libvirt won't even exec anything for most non-QEMU hypervisors so it doesn't have anything to sandbox. I also don't see how those other hypervisors are relevant to Kata.

I know qemu will benfit, I supposed others would benefit (see the "may not support .."), if I'm wrong that's fine and it only means it'll only benefit qemu.

About the other hypervisors, they are relevant for kata. I don't think we should include any dependency on any project that pretends to supports additional hypervisors and additional qemu versions.

This is also why I said above, quote: "kata-runtime needs to speak a single version of the qemu command line at any given time".

This is also irrelevant for either Libvirt or Kata as you say, but anyway patches are welcome on qemu-devel@nongnu.org if you believe that would be an improvement.

I never intended to talk about things that are irrelevant to kata, I just made an example about a security feature that we can tweak on by default in the container runtime and that is missing in libvirt if libvirt is configured to use qemu and to start qemu without the sandbox. You then disagreed that it's useful and you justify the lack of its enforcement at all times on the libvirt side by comparing libvirt to docker/podman.

bonzini commented 4 years ago

libvirt will never run qemu as root, as such it also should not allow qemu to gain privilege from any suid. No matter if the sandbox is enabled or not.

"Opt out" means literally that you are renouncing something. Opting out of sandboxing disables all of it, including PR_SET_NO_NEW_PRIVS. Why does it matter? It's a non-default setting that should not be used in production, basically a debugging tool. Instead, when the next security feature will come (and, for good reasons, it will be opt in for libcontainer), you can just count on libvirt to apply it and, if it doesn't, you can contribute a patch to libvirt to do so, just like you would contribute a patch to kata-runtime that turns on the appropriate libcontainer knob.

I don't think we should include any dependency on any project that pretends to supports additional hypervisors and additional qemu versions.

Again, please justify your assertion. Why would one care if Libvirt can drive QEMU, ESX or anything else? Libvirt is modular, if you don't want to waste disk space on ESX support you just don't install the .so for ESX. And of course Kata won't talk to any hypervisor other than QEMU if the Libvirt bridge hardcodes QEMU as the hypervisor to connect to.

aagit commented 4 years ago

"Opt out" means literally that you are renouncing something. Opting out of sandboxing disables all of it, including PR_SET_NO_NEW_PRIVS. Why does it matter? It's a non-default setting that should not be used in production, basically a debugging tool.

Why exactly would you need not to set PR_SET_NO_NEW_PRIVS during any libvirt+qemu debugging? Do you have debug apps running as suid in your system? Could you mention which one and in which realistic scenario the lack of PR_SET_NO_NEW_PRIVS in libvirt provides a debug aid to either libvirt or qemu?

Can you confirm that the the lack of the prctl(PR_SET_NO_NEW_PRIVS) is considered a feature in libvirt?

Again, please justify your assertion.

How can I possibly justify a fact? Is it a fact that reducing dependencies for a project is a win if those dependencies provide no benefit?

It's not me asking kata-runtime upstream to add a dependency on a project that supports a multitude of other hypervisors besides qemu that would never be used by kata-runtime (and that doesn't support cloud hypervisor).

In my view it's such request that should be justified with a long term benefit to kata-runtime.

Why would one care if Libvirt can drive QEMU, ESX or anything else? Libvirt is modular, if you don't want to waste disk space on ESX support you just don't install the .so for ESX. And of course Kata won't talk to any hypervisor other than QEMU if the Libvirt bridge hardcodes QEMU as the hypervisor to connect to.

I care if a project pretends to support hypervisors that are useless to kata-runtime, because no amount of overhead and source complexity, even if only in the git grep results, is justified, not matter how small, even if only in the harddisk of the devs, unless it provides a long term benefits to kata-runtime.

The fact "you can" add the dependency doesn't imply "we should". The fact "you can" alone is not good enough justification.

Earlier you also mentioned "especially with the new daemon-less operation": you omitted that is still in the works and it's not a done thing usable in production in the short term.

Looking at the short term: it'd be surprised if it's less work to make libvirt run daemon-less than to add the vfio unprivileged support to kata-runtime on top of the -runas option to drop root.

Looking at the long term: if we can reduce the dependencies with zero feature loss it's a win (are you going to ask me again to justify a fact?). The end result will have the effect of reducing the code duplication with the container runtime by reusing the standard security isolation practices of applying part of the cgroup/namespaces/selinux/seccomp/PR_SET_NO_NEW_PRIVS/dropcapetc.. through the container schema. (the seccomp more finegrined filter will still have to be applied by qemu on itself).

About the "the advantage of libvirt is that it can be considered a black box as far as Kata is concerned": if the common security isolation practices of the container runtime are secure enough for bare metal containers that lack the virt isolation (where they act as first line of defence), we'd have a problem if they are not enough for qemu (where they act only as second line of defense). The "black box" would hide the problem for qemu, but what about all the other containers out there that can't run under the "black box" then? Shouldn't we worry about fixing the first line of defense for the bare metal containers, before worrying about the second line of defence for the kata-containers?

All the above assumes kata-runtime upstream will then accept patches to add unprivileged vfio or anything else missing that would be currently available if adding the "black box" to the equation.

bonzini commented 4 years ago

Could you mention which one and in which realistic scenario the lack of PR_SET_NO_NEW_PRIVS in libvirt provides a debug aid to either libvirt or qemu?

For example if you are implementing a feature that requires a privileged helper and you haven't implemented yet the code to launch the helper from Libvirt.

Can you confirm that the the lack of the prctl(PR_SET_NO_NEW_PRIVS) is considered a feature in libvirt?

It's a feature if you opt out of sandboxing. It's irrelevant if you don't.

About the "the advantage of libvirt is that it can be considered a black box as far as Kata is concerned": if the common security isolation practices of the container runtime are secure enough for bare metal containers that lack the virt isolation (where they act as first line of defence), we'd have a problem if they are not enough for qemu (where they act only as second line of defense).

You already have answered this yourself, didn't you? As you said, podman/docker/libcontainer have to run every kind of container, therefore many security features cannot be active by default. You have to care about these manually if you use libcontainer. Libcontainer is not a black box in that sense. Libvirt is more specific and therefore it can take care of all the security knobs.

Looking at the long term: if we can reduce the dependencies with zero feature loss it's a win (are you going to ask me again to justify a fact?)

There would be a maintenance cost associated to the extra code in Kata. Of course it's not possible to know the answer without seeing the code for both approaches, but people can use their judgment to decide whether a dependency is good or bad. Otherwise we'd be writing everything on assembly to avoid the libc dependency (that's of course an exaggeration).

All the above assumes kata-runtime upstream will then accept patches to add unprivileged vfio or anything else missing that would be currently available if adding the "black box" to the equation.

I am sure Kata would accept these patches if they were proposed. It's probably not so easy, the relevant issues have been open for almost 2 years, and there are people not using Kata in multi-tenant scenarios because of this missing feature (using instead a custom runtime, I even know of one that has patched Libvirt with Firecracker support!).

aagit commented 4 years ago

Could you mention which one and in which realistic scenario the lack of PR_SET_NO_NEW_PRIVS in libvirt provides a debug aid to either libvirt or qemu?

For example if you are implementing a feature that requires a privileged helper and you haven't implemented yet the code to launch the helper from Libvirt.

Can you confirm that the the lack of the prctl(PR_SET_NO_NEW_PRIVS) is considered a feature in libvirt?

It's a feature if you opt out of sandboxing. It's irrelevant if you don't.

Libvirt has privileges, it cannot possibly need suid helpers, that is the whole point of leaving the privilge to libvirt, so nothing else shall be trusted, isn't it?

In other words libvirt should also run under PR_SET_NO_NEW_PRIVS at all times, not just qemu.

If libvirt ever need suids to function, it has a much bigger problem than lack of PR_SET_NO_NEW_PRIVS. (i.e. the debug "feature")

You already have answered this yourself, didn't you? As you said, podman/docker/libcontainer have to run every kind of container, therefore many security features cannot be active by default. You have to care about these manually if you use libcontainer. Libcontainer is not a black box in that sense. Libvirt is more specific and therefore it can take care of all the security knobs.

libvirt is non-manual, non-libvirt is manual?

It's actually software in both cases, there's no such thing as "non-manual" here, there is no AI involvement in the software development in both cases, ideally it would be written by the same developers too.

There's no reason why it can't be "ported" to the new model. What is fixed in stone and static for the whole runtime of qemu can be offloaded to the container schema, you just make sure through software that it get applied correctly. The rest is done by kata-runtime and qemu (seccomp finegrined filter as example). The idea is purely to reuse the container runtime facility, and to put the software in the compose.

Whatever shall work securely for all containers, I'm not sure why it can't work for qemu where it's actually a second line of defense.

The idea that you can't trust any software except a very specific black box has no technical justifications in the long term, it's a non technical argument as far as I'm concerned.

There would be a maintenance cost associated to the extra code in Kata. Of course it's not possible to know the answer without seeing the code for both approaches, but people can use their judgment to decide whether a dependency is good or bad. Otherwise we'd be writing everything on assembly to avoid the libc dependency (that's of course an exaggeration).

Let's assume for a second libvirt was actually a lib.

Assume you need a single function from such a lib.

However that single function is also available in the standard libopenssl (i.e. container runtime), but you need to change the way things work to use it. Well you need in fact to change the code a bit anyway, even to call such function from the other lib.

Do you pick the de facto standard libopenssl that every other container already uses, or do you go with the lib that isn't installed by default and that can only be used by a single app (qemu)?

Now in reality, libvirt currently is not a lib, but a a daemon, so until the daemon-less libvirt is available, no comparison with any lib, glibc included, can hold here.

I am sure Kata would accept these patches if they were proposed.

I hope so too, I specified it for completeness.

It's probably not so easy, the relevant issues have been open for almost 2 years, and there are people not using Kata in multi-tenant scenarios because of this missing feature (using instead a custom runtime, I even know of one that has patched Libvirt with Firecracker support!).

It's "probably not so easy", I agree, but you didn't answer to the part where I specified the daemonless libvirt is not ready yet and that also requires work. So the question is: is libvirt daemon-less "probably easy" and with a lower time to market than taking the "probably not so easy" approach?

NOTE: if the "probably not so easy" approach fails because the complexity gets too hard to implement unprivileged vfio in kata, then of course libvirt can and should remain the plan B. What I'm objecting is the starting with integrating libvirt without looking at any other option that may provide long term advantages (and in my expectation also short term advantages).

bonzini commented 4 years ago

In other words libvirt should also run under PR_SET_NO_NEW_PRIVS at all times, not just qemu.

If you can get code execution in libvirt, you get root:root and pretty much all capabilities too. It would be a no-op and deceiving for Libvirt to set PR_SET_NO_NEW_PRIVS on itself

The idea that you can't trust any software except a very specific black box has no technical justifications in the long term, it's a non technical argument as far as I'm concerned.

That's absolutely not what I said, but I don't think it's necessary to discuss this further.

Now in reality, libvirt currently is not a lib, but a a daemon, so until the daemon-less libvirt is available

It will be available (though experimental) in Libvirt 6.1.0. Release candidate can already be downloaded.

What I'm objecting is the starting with integrating libvirt without looking at any other option that may provide long term advantages

I think everybody agrees that choosing a dependency over another is a tradeoff. I suggest respectfully that you avoid making blind assertions and wait until code is available. In the meanwhile, you could work on a proof of concept of your suggested approach so that it's possible to compare their relative advantages.

aagit commented 4 years ago

In other words libvirt should also run under PR_SET_NO_NEW_PRIVS at all times, not just qemu.

If you can get code execution in libvirt, you get root:root and pretty much all capabilities too. It would be a no-op and deceiving for Libvirt to set PR_SET_NO_NEW_PRIVS on itself

It is not a noop, it makes a difference to all children of libvirt, instead of forking and setting it in the qemu child libvirt should set it in the parent.

By doing so, it will also make sure that your "debug" example where supposedly somebody was debugging an helper running a suid from qemu, in order to later move it to libvirt, will not risk to work neither in qemu nor in libvirt because it'd be a design mistake (again not a noop).

Even if it was a noop, I don't get what you mean with deceiving, if something it will prevent somebody to do a design mistake, even if it would have no other effect, it's a good hint of what libvirt should keep not doing.

The idea that you can't trust any software except a very specific black box has no technical justifications in the long term, it's a non technical argument as far as I'm concerned.

That's absolutely not what I said, but I don't think it's necessary to discuss this further.

What you said is "advantage of libvirt is that it can be considered a black box as far as Kata is concerned, especially with the new daemon-less operation" and then "Libcontainer is not a black box in that sense. Libvirt is more specific and therefore it can take care of all the security knobs.".

There is no technical argument here that you provided other than we should trust you that a black box should be better "than taking care of many security features manually", which implies black box isn't manual, non-black-box is manual, which again is not a technical argument.

So ultimately all I can conclude is that you trust the black box more than doing things "manually" for non technical reasons, simply because there's no such thing as a black box or a manual process here, it's just different software.

I think everybody agrees that choosing a dependency over another is a tradeoff. I suggest respectfully that you avoid making blind assertions and wait until code is available. In the meanwhile, you could work on a proof of concept of your suggested approach so that it's possible to compare their relative advantages.

It's the second time you keep asking me to send patches, in case it's not obvious, if I had the time to work on kvm userland, I obviously wouldn't be writing a message here but I would already have sent a patchset instead. If nobody else does it I can still try to find the time to do it myself, but for now my message is only to the kata-runtime upstream developers that have the libvirt integration enhancement in "need-review" mode. They will have to take a decision eventually and I think it's only fair enough that they hear another point of view in case I may have pointed out anything that wasn't already obvious to them.

bonzini commented 4 years ago

So ultimately all I can conclude is that you trust the black box more than doing things "manually" for non technical reasons, simply because there's no such thing as a black box or a manual process here, it's just different software.

I think the meaning is obvious. "It's a black box" means "the kata developers can expect Libvirt to set up sandboxing and not care about how that is done" like OpenStack, oVirt, Boxes, virt-manager, Cockpit developers are already doing; while with libcontainer you would have to invoke the APIs manually. That's not bad per se, but it's an aspect that should be taken into account.

I think everybody here is experienced enough to understand that it's just different software, on the other hand ignoring code reuse is more than a bit disingenuous. Is it a "technical reason" or not? I guess that depends on the definition you have of "technical reason".

for now my message is only to the kata-runtime upstream developers that have the libvirt integration enhancement in "need-review" mode.

Ok, thanks for making this clear.