Scope of OCI Runtime Certification with software higher in the stack

alban commented 6 years ago

The OCI runtime validation suite is able to test runtimes that expose the OCI Runtime interface to users (operations create, start, etc. and the config.json file). It should be possible to test software like runc, bwrap-oci, crun, railcar.

What about software that use a OCI runtime internally but don't directly expose the OCI Bundle to users, such as containerd, Docker, CRI-O? Should they be able to be certified in some ways?

I am not sure it would make much sense but I have not seen discussion whether it should be in scope of the certification or not.

Technically, if the software does not follow the OCI Runtime Command Line Interface, the validation suite cannot test it. Exposing such a CLI interface in containerd or CRI-O does not make sense to me, since their goal is to expose a gRPC interface which is semantically different to the OCI Runtime CLI.

/cc @caniszczyk

caniszczyk commented 6 years ago

@alban I would like to see containerd, docker, cri-o to be certified, preferably via some testing method. Do you have any ideas how we can accomplish this or we do push the runtimes to expose this functionality?

Another option is to get a dump of their dependencies and ensure they use a unmodified release of runc (that they sign an agreement too) but that's not as useful as actually exercising the operations, similar to what we do with folks that use Kubernetes in products.

caniszczyk commented 6 years ago

Hey @opencontainers/tob, what's your thoughts here?

thecloudtaylor commented 6 years ago

Interesting question... you could imagine a test version of runc that ensures that the interfaces are called in accordance with the spec and a test requirement that the runtime provide a log proving that they ran against that (in effect a unit test/mock runc). The challenge I could see with that is obviously building/maintaining it.

vbatts commented 6 years ago

We had talked about this in the past, and thought there to be definite benefit. Like to have a container that runs and returns a report. There was generally lack of consensus on the approach, and on whether it's worth the pursuit while the code/criteria was changing. Now that v1.0 is out, I think this is fine conversation.

estesp commented 6 years ago

All the "higher order" runtimes listed by @caniszczyk directly call runc (and in the case of Docker, do that through containerd), and expect (or deliver) a runc binary on the system.

[I'm temporarily ignoring the extra fact that this runc binary is also pluggable with any "OCI compliant" runtime (I'm using the term compliant loosely given the repo we are in)]. 😇

Other than validating that this installed runc meets the OCI spec, I'm not sure how else you would claim OCI compliance. Containerd can be called via API to be handed an OCI bundle (raw spec + mounted filesystem), such that you could almost do direct interactions with runc, but don't think that would be possible via Docker and I assume no via cri-o, based on the focus on being a CRI-compliant runtime.

Other than that rambling, I agree there is value with being able to claim higher order runtimes have "OCI compliance" as the expectation is that it, at some point, it will be a detriment to any container-executing product/project not being visibly seen as supporting OCI specs.

wking commented 6 years ago

On Wed, Apr 04, 2018 at 04:19:39PM +0000, Taylor Brown wrote:

… you could imagine a test version of runc that ensures that the interfaces are called in accordance with the spec and a test requirement that the runtime provide a log proving that they ran against that (in effect a unit test/mock runc).

This is an interesting angle, effectively testing caller compliance with the OCI Runtime Command Line Interface 1. Note that CRI-O, at least, would not currently pass such a test because it require runc-specific extensions (kubernetes-incubator/cri-o#1260). I assume it would be up to the applicant to generate a test suite to drive their engine through whatever subset of the CLI interface was required for compliance.

A parallel approach more like an integration test would be driving these higher-level engines through the existing runtime-tools validation tests. You could do that either by writing wrappers to translate from the OCI Runtime Command Line Interface to the higher-level APIs. Or you could do it by specifying (and implementing runtime-tools drivers for) additional testing APIs 2.

alban commented 6 years ago

A parallel approach more like an integration test would be driving these higher-level engines through the existing runtime-tools validation tests. You could do that either by writing wrappers to translate from the OCI Runtime Command Line Interface to the higher-level APIs. Or you could do it by specifying (and implementing runtime-tools drivers for) additional testing APIs 2.

That seems difficult to me:

the higher-level engine might utilize only a subset of OCI runtime features. For example, the CRI gRPC interface does not expose the feature to reuse an existing mount namespace from a specified path.
the higher-level engine might use operations only together: Create and Start. In that case, a test that only need to perform the "Create" operation might not have a translation in the higher-level engine API.

wking commented 6 years ago

the higher-level engine might utilize only a subset of OCI runtime features.

The OCI Runtime Command Line Interface does not currently have anything I can point to that MUSTs support for all the specified commands. That was my intention though, so I'll file a follow-up PR there and for this discussion let's assume it does. I'm fine marking engines as compliant if they only need a subset of that API. I'd like to mark engines as non-compliant if they need any extentions beyond that API. That way, consumers would know they could use any runtime compliant with runtime-spec and the command line API with any higher-level engine compliant with the command line API.

For example, the CRI gRPC interface does not expose the feature to reuse an existing mount namespace from a specified path.

That's fine, because mount namespaces are not part of the command line API, they're part of the runtime-spec. For command line API compliance around container config JSON, you'd just have to verify that, if create had been called, that call was with a valid signature, that the caller gracefully handled both zero and non-zero exit codes, and that the caller didn't choke on console socket activity when it asked for --console-socket.

the higher-level engine might use operations only together: Create and Start.

That's fine.

wking commented 6 years ago

The OCI Runtime Command Line Interface does not currently have anything I can point to that MUSTs support for all the specified commands. That was my intention though, so I'll file a follow-up PR there…

Filed as opencontainers/runtime-tools#615.

alban commented 6 years ago

A parallel approach more like an integration test would be driving these higher-level engines through the existing runtime-tools validation tests. You could do that either by writing wrappers to translate from the OCI Runtime Command Line Interface to the higher-level APIs.

I explored this approach both in containerd (https://github.com/opencontainers/runtime-tools/issues/653) and with the CRI interface (https://github.com/opencontainers/runtime-tools/issues/657) but I find this quite contrived and not so meaningful because the interfaces don't exactly match.

The other approaches mentioned in here:

caller compliance: it would not test the same thing, so I'd not conflate it together with the same certification name as for runc.
- dependencies dumping: although it is not a testing method, it is the simplest.

dongsupark commented 6 years ago

To achieve runtime certification for runc, following the dependencies dumping approach above, there could be multiple options. We will list them below to get some feedback about which one would be appropriate.

It is also known that some 3rd-party vendors would want to be able to apply own runc patches for security reasons.

1) pick a hash from a specific git commit in the upstream runc tree, and use it for runtime certification.

Pros: This is one of the most reliable ways for certification, as the specific commit will be permanent and used by every runtime.
Cons: It is not possible to allow 3rd-parties to apply their own runc patches, because then the downstream tree will diverge from the upstream tree. From the certification's perspective, you will not be able to know whether the downstream tree is a legitimate one or a manipulated one.

1.a) Alternatively, an approach of calculating hash of a runc binary file is doable, but basically it belongs to the category 1 as well.

2) maintain another runc tree dedicated to the certification

Provide a separate runc tree that contains all known security patches to 3rd-party vendors.
Pros: It becomes simple to do certification. Just check if the given runc version includes a commit in the dedicated runc tree. It's also possible to support 3rd-party patches.
Cons: Obviously it's additional maintenance burden for the whole community.

3) Simply check for a semantic version of runc instead of a specific commit
Pros: It's very simple to do certification.
Cons: It will not be suitable for every container runtime, because some container runtimes install runc from a specific upstream commit, instead of a released version.

There are two variants, depending on whether it's possible to accept 3rd-party patches.

3.a) Allow 3rd-party patches
Pros: vendors are able to apply own patches
Cons: certification should be done only for a specific runc release. Runtime should be assumed to be certificated, even if they applied own patches between releases. Kubernetes certification is done like this.

3.b) Disallow 3rd-party patches
Pros: certification can be done for most cases, not only releases but also a certain commit.
Cons: it's not possible to apply vendor-specific patches.

caniszczyk commented 6 years ago

Hey @vbatts and other @opencontainers/runc-maintainers, thoughts here

crosbymichael commented 6 years ago

Don't certify runc commits. Who cares if vendors add commits or change things. As long as X binary passes certification with Y criteria, who cares what code is used. Same thing for certifying high level software. First you should ask if they want to, next you should create a "test harness" for that software. I think we are all learning that we cannot force ppl to use a CLI to be OCI compliant, people use JSON over http, GRPC, etc and probably already have full compliance today, the API just does not fit into our view of what a compliant runtime is right now.

dongsupark commented 6 years ago

Thanks for the reply. Then I suppose the only valid option is 3.b), "Simply check for a semantic version of runc, and disallow 3rd-party patches".

crosbymichael commented 6 years ago

@dongsupark That is my opinion and others should weigh in, but I don't think, disallowing 3rd party patches is acceptable. There are security fixes etc that distros will push. We are certifying the API not the code. If I have some patches in my personal runc for my own uses but it passes 100% of certification, how is it not valid?

cyphar commented 6 years ago

I think this topic has gone out-of-hand -- why are we discussing certification of runc releases (or commits) when it comes to discussing whether a higher-level user of an OCI runtime is compliant? Personally I think this should be done by calling higher-level users of an OCI runtime "OCI compatible" if you can swap out OCI-compliant runtimes from underneath them -- and their status of "OCI compatible" will be revoked if you can prove that they don't operate properly if you swap out one runtime with another. Simple as that.

(As an aside, I also agree with @crosbymichael -- though that shouldn't be a surprise given that I work on a distribution. You would need to define what a "third-party patch" is and whether it makes sense to deny certification based on a documentation or trivial build issue patch -- which are patches that we've had to carry in the past. It's just silly to restrict one of the main benefits of the free software ecosystem -- the right to fork and carry your own patches -- because then runc becomes an effectively proprietary black box that you cannot modify without revoking your certification.)

dongsupark commented 6 years ago

@cyphar Thanks for your suggestion!

So we have come up with a different approach following your suggestion, as described below. Please have a look, and give us feedback. Thanks.

The OCI runtime compliance tests

Certification for compliance with the OCI runtime spec can be done in two different ways:

Using the runtime-tools automatic tests if your container runtime directly follow the interface from the spec (e.g. runc).
Testing higher order runtimes manually by replacing the OCI component by one that has been validating with point 1 (usually runc). The manual tests are described below.

Certification for the OCI runtime compliance is usually done manually mainly because each container runtime or each high-level container manager has its own interface, so there cannot be a single common layer that can be used for the certification.

The OCI runtime compliance test is basically done by following these steps:

Replace the OCI component of your container runtime with runc, the only runtime that is OCI runtime compliant.
Create your container runtime.
Start your container runtime.
Verify that your container runtime is conformant to the OCI container runtime-spec.
Store the results into a single text file into results.txt.
Kill your container runtime.
Delete your container runtime.

It's completely up to an individual runtime, how the verification is to be done. It can be done by either running an automated tool, or a manual verification.

Let’s take cri-o as an example:

an example of running runtime-specific tests

Let’s assume that one runtime example cri-o supports a possibility of running runtime validation tests on its own.

Run its daemon crio on one terminal with a specific runtime /usr/bin/runc.

$ sudo ./crio --runtime=/usr/bin/runc
...

On the other terminal, run ordinary integration tests for cri-o.

$ pwd
/home/user/go/src/github.com/kubernetes-incubator/cri-o
$ sudo make localintegration RUNTIME=/usr/bin/runc 2>&1 | tee results.txt
...
ok 1
ok 2
ok 3
...

Then you are able to store the results into results.txt.

Uploading the runtime compliance results

Prepare a PR to https://github.com/cncf/oci-conformance/ (The repo is TBD). Its description could be like this, when the runtime name is runtime_a.

OCI conformance results for runtime_a/v1.0

Contents of the PR

* README.md: a human-readable description of how to reproduce the results
* PRODUCT.yaml: basic information about the certified runtime
* results.txt: result of the OCI runtime compliance verification

PRODUCT.yaml can have the following attributes:

name: name of the container runtime
vendor: name of company or organization
version: semantic version of the runtime
description: one sentence description of the corresponding container runtime
website_url: URL of the project (optional)

crosbymichael commented 6 years ago

I think we said this multiple times, why not just make a test harness and let high level runtimes implement it, and that interface can be plugged into your test code?

jdolitsky commented 2 years ago

Going to close as outdated. Please re-open if more discussion necessary.

opencontainers / oci-conformance