opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.44k stars 634 forks source link

Outline the transportable objects #23

Closed vbatts closed 8 years ago

vbatts commented 8 years ago

From a single tar archive of tar archives and JSON documents, or a manifest JSON document that points to other objects needing to be fetched.

UserStory:

See also https://groups.google.com/a/opencontainers.org/d/msg/dev/VKpZNs-qYoI/RzPhj71ODgAJ

jonboulle commented 8 years ago

Is this essentially the enhancement/replacement of serialization.md?

vbatts commented 8 years ago

@jonboulle sorry, this was more of placeholder thought. The serialization doc does have tar archive info. Though I was thinking more of the manifest. This may be a non-topic, if the culmination of referenced layers in a manifest are handled independently, but the docs need to be cleaned up to show the transportable states of the image.

vbatts commented 8 years ago

Updated the title and comment. Hopefully that helps a bit more.

vbatts commented 8 years ago

Once you've gotten all the objects on disk specified in a manifest, there will be a validation and translation to the runtime-spec bundle, or similar. This needs to be outlined as well.

philips commented 8 years ago

I think there are two things here.

Spec out a directory layout for manifests and blobs that mirrors how the image manifests work. My suggestion is that the manifest is the primary object in the system so the other "things" are adjacent.

$ cd my-oci-image/
$ find .
.
./blobs
./blobs/sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270
./blobs/sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f
./manifests
./manifests/v1.0
./manifests/v1.0-debug

Build a tool that consumes this directory and creates a runnable OCI bundle.

ocitool runtime-bundle file:///path/to/my-oci-image/manifests/v1.0
wking commented 8 years ago

On Wed, Apr 20, 2016 at 03:20:44PM -0700, Brandon Philips wrote:

$ find . . ./blobs ./blobs/sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270 ./blobs/sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f ./manifests ./manifests/v1.0 ./manifests/v1.0-debug

I'd expect the format should also include signatures and the name ↔ format://manifest-hash being asserted by those signatures, because the “I just want to attach one thing” workflow 1 shouldn't require additional network activity to collect those names and signatures.

And the single-archive-tar use-case needs a HEAD reference or some such to pick the default manifest.

So how about:

./VERSION ./HEAD ./objects/sha256/5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270 ./objects/sha256/e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f ./objects/sha256/ca6e91e8f982603d5ee741db911fdd8814185f4da52df3dba9e90cf67aed9c9a ./objects/sha256/fada3467c61b55140ce331f4f515fe61d1d1b99951893aee822c7c3c394061f6 ./objects/sha256/07dec7175b024a30f22b1be7853480472b9b406afc593568b6f19dd7bf08f507 ./objects/sha256/19f58930b0fffd0dd3e4ed7a4a4059f45fed9a759f69f8544a45ac0dfd954b36 ./objects/sha256/6ce50f12c5a18c8c755d05c9f760ae0d0f71f60885b8937b94a1da52fc9b2898 ./objects/sha256/d1ee7338f8ebdcaf78acbd63e640cfd9e1a45a29d23c63656d3158c350c3325d ./manifests/v8.0 ./manifests/v8.0-debug

where:

And folks who wanted could hit a signature registry to check for additional signatures (or revocations) for sha256/07de… etc.

Taking this to its logical conclusion, you might also want to also include blobs for and references to public keys 3, signing algorithms 4, validity schemes 5, etc. It just depends on how much of the verification workflow you want to be able to preform based on the file contents vs. alternative communication channels and how much you like linked data ;).

philips commented 8 years ago

VERSION doesn't seem necessary. The manifests describe themselves.

philips commented 8 years ago

I am sort of "meh" on creating some optimization that doesn't force a user to specify exactly what they want. HEAD and "latest" create all sorts of weird races and UX.

wking commented 8 years ago

On Wed, Apr 20, 2016 at 04:34:24PM -0700, Brandon Philips: “VERSION doesn't seem necessary. The manifests describe themselves.”

But they don't describe the format of the structure holding the manifests. For example, maybe a later version of that structure decides to follow Git in fanning-out the blob store.

wking commented 8 years ago

On Wed, Apr 20, 2016 at 04:35:39PM -0700, Brandon Philips wrote: “I am sort of "meh" on creating some optimization around forcing a user to specify exactly what they want to render. HEAD and "latest" create all sorts of weird races and UX.”

It lets you run (for the tarball case 1):

$ ocitool runtime-bundle file:///path/to/my-oci-image.tar.gz $ ocitool runtime-bundle https://example.com/path/to/my-oci-image.tar.gz

or (for the unpacked directory case):

$ ocitool runtime-bundle file:///path/to/my-oci-image/

instead of:

$ ocitool runtime-bundle file:///path/to/my-oci-image.tar.gz v1.0

If the image publisher has a particular manifest in mind as a reasonable default (maybe the image file only contains a single manifest?), the extra ‘v1.0’ is unnecessary noise. Folks could still use it if they didn't like the default choice:

$ ocitool runtime-bundle file:///path/to/my-oci-image.tar.gz v1.0-debug

kamalmarhubi commented 8 years ago

Clarification: is v1.0 and v1.0-debug an example set of manifests where the latter is a debug build of the same artifact?

wking commented 8 years ago

On Wed, Apr 20, 2016 at 08:22:56PM -0700, Kamal Marhubi wrote: “Clarification: is v1.0 and v1.0-debug an example set of manifests where the latter is a debug build of the same artifact?”

I don't think it really matters what the difference between the names is. It just matters that you can put multiple named manifests into the file. I'm not sure that's a requirement for the initial use case 1, but I think it's useful to channel git-bundle [2](which also allows multiple references in a single bundle).

kamalmarhubi commented 8 years ago

I mean is that the intention of the running example? Otherwise I'm confused.

philips commented 8 years ago

@kamalmarhubi sure, just something where you could imagine a different set of default args, environments, or some additional objects added to the layer. I could have said v1.0 and v2.0 as well.

philips commented 8 years ago

@vbatts What do you think of my rough sketch? Does this address what you had in mind?

vbatts commented 8 years ago

re: https://github.com/opencontainers/image-spec/issues/23#issuecomment-212636147

It is logical, though very different from say the current docker save/load format. Does this rule out the possibility of a config per rootfs? (even if only for historical audit). Is the ./manifests/v1.0-debug an alternate manifest that would include filesystems with debug symbols and tooling?

An aside: like the mime-type compatibility table we've mentioned, this makes me think we should make the docs all read very clearly on expected behaviour per mime-type (media-type). It is already a little bit, but will be the clarity as we vet everything out.

philips commented 8 years ago

@vbatts where is the docker save/load format documented?

philips commented 8 years ago

On Fri, Apr 22, 2016 at 8:47 AM Vincent Batts notifications@github.com wrote:

Does this rule out the possibility of a config per rootfs? (even if only for historical audit).

I don't believe so. The configs are CAS, right? https://github.com/opencontainers/image-spec/blob/master/manifest.md#example-image-manifest

Is the ./manifests/v1.0-debug an alternate manifest that would include filesystems with debug symbols and tooling?

Sure, I am just pointing out that someone will likely want a "fat" bundle that contains 1 or more manifests and some potentially shared blobs.

An aside: like the mime-type compatibility table we've mentioned, this makes me think we should make the docs all read very clearly on expected behaviour per mime-type (media-type). It is already a little bit, but will be the clarity as we vet everything out.

Not following, file a separate issue?

vbatts commented 8 years ago

@philips save/load is not documented to my knowledge.

I don't believe so. The configs are CAS, right?

That was my thought. If they're present but not used, then "undefined behaviour" I suppose.

Sure, I am just pointing out that someone will likely want a "fat" bundle that contains 1 or more manifests and some potentially shared blobs.

Right on.

Not following, file a separate issue?

Just a brainstorm. I'll mull on it further.

vbatts commented 8 years ago

@stevvooe in your opinion, if there were ever a docker save --format=oci.v1 ... flag, would you see that being feasible with https://github.com/opencontainers/image-spec/issues/23#issuecomment-212636147 ? Wouldn't mind weigh-in from @aaronlehmann

vbatts commented 8 years ago

@philips so sorry. save/load is here https://github.com/opencontainers/image-spec/blob/master/serialization.md#combined-image-json--filesystem-changeset-format

stevvooe commented 8 years ago

@vbatts docker save with a format parameter doesn't seem wildly out of place. I would like to see what @aaronlehmann would have to say.

It may be a good idea to decouple the save format from the serialization specification (or slice it differently).

The only difference I would have is to have an extra layer of indirection in the save format for the tag types:

./objects/sha256/5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270
./objects/sha256/e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f
./refs/v8.0
./refs/v8.0-debug

We then give the "refs" a schema like this:

{
  "name": "v8.0",
  "target": descriptor
}

We have toyed around with adding something similar to this, calling it a "Tag" object. I am not yet convinced this is a great idea, but we should give this road a very hard look.

wking commented 8 years ago

On Wed, Apr 27, 2016 at 04:17:40PM -0700, Stephen Day wrote:

We then give the "refs" a schema like this:

{
  "name": "v8.0",
  "target": descriptor
}

We have toyed around with adding something similar to this, calling it a "Tag" object. I am not yet convinced this is a great idea, but we should give this road a very hard look.

This is just pushing the manifest content into CAS, which I think is a great idea. In 1, I called those “metafile blobs” and suggested following TUF and allowing an opaque ‘custom’ field in case the signer wants to attach additional information (e.g. an expiration timestamp). Then I suggest pushing the metafile blobs into CAS as well, so the non-CAS refs are “CAS object {hash} is signed by {signature hashes}” to support object → signature lookup. Keeping manifests, “metafile blobs” (your tags), and signatures in CAS allows you to point someone at signature sha256:6ce5…, and they can see that you're talking about Debian 8.0 as asserted by Alice on 2016-04-27.

vbatts commented 8 years ago

@stevvooe so that is not far off from https://github.com/opencontainers/image-spec/issues/23#issuecomment-212636147 @philips what do you think of the manifests json doc itself being in the blobs/objects path, having a reference point to it?

philips commented 8 years ago

@vbatts what do you mean by a reference that points to it? Like a symlink? A symlink seems fine.

I don't really think we should add a layer of indirection with a new JSON type that then points to an object unless there is a really great reason for it.

wking commented 8 years ago

On Mon, May 02, 2016 at 12:24:42AM -0700, Brandon Philips wrote:

@vbatts what do you mean by a reference that points to it? Like a symlink? A symlink seems fine.

I don't really think we should add a layer of indirection with a new JSON type that then points to an object unless there is a really great reason for it.

I'd caution against symlinks and stick with explicit by-hash references to CAS objects. Symlinks will work fine in the tarball context, but are problematic in other contexts (e.g. when pushing to an API). Since this serialized format seems like it's intended to be a tarball of the usual manifest + names (and signatures?), I think it should reuse the (JSON?) objects for names and signatures instead of introducing a new symlink-based approach to naming.

vbatts commented 8 years ago

@philips Sorry for now being clear. I was thinking that ./manifests/0.8.0 or ./refs/0.8.0 file just had a sha256:deadbeef.. contents. Where that referenced object is the JSON document.

stevvooe commented 8 years ago

@vbatts @philips @wking Given the confusion between #23 and #38, I'm wondering if this issue should be more oriented towards collecting the system objects into a specification, called out by mediatype.

wking commented 8 years ago

On Tue, May 24, 2016 at 04:21:38PM -0700, Stephen Day wrote:

I'm wondering if this issue should be more oriented towards collecting the system objects into a specification, called out by mediatype.

That sounds useful to me. Are you suggesting specifying things like:

I have suggestions for the first and last of those sketched in 1.

And I'd suggest listing existing MIME types (e.g. application/pgp-signature 2, application/jose+json 3) that implementations are likely to support. Rolling our own signature MIME types doesn't seem useful.

stevvooe commented 8 years ago

@wking While I wasn't really thinking of signatures here, that is definitely a use case.

The advantage is that no one needs to add a new way to add signatures to the archive format. We simply specify the mediatype and parentage (how it is referenced) and it can be picked up as a content addressable blob in the archive.

Rolling your own mediatypes may be useful, but can be ignore. Refs are only followed if the component understands the mediatype:

refs/0.8.0
    /signatures -> {"digest": <sha to signature file>, "mediatype": "application/some-signature-extension"} 
wking commented 8 years ago

On Tue, May 24, 2016 at 07:38:09PM -0700, Stephen Day wrote:

The advantage is that no one needs to add a new way to add signatures to the archive format. We simply specify the mediatype and parentage (how it is referenced) and it can be picked up as a content addressable blob in the archive.

I think specifying “how it is referenced” is adding signatures to the archive format. But I agree that runtimes won't need to support all possible signature types.

philips commented 8 years ago

Everyone- I put up a PR to define an "image layout": https://github.com/opencontainers/image-spec/pull/94

vbatts commented 8 years ago

This is fixed by #94 IMO