cyphar commented 7 years ago

While trying to implement index.json parsing in umoci I've come to a bit of a worrying realisation. There isn't really any explanation of how cases where org.opencontainers.ref.name is not unique should be handled. While I understand the case where you have multiple application/vnd.oci.image.manifest.v1+json descriptors with different platform entries (though I think that doesn't make sense in light of the fact that application/vnd.oci.image.index.v1+json has platform entries too), it doesn't really make sense to me in the general case to allow org.opencontainers.ref.names to be duplicated in index.json.

So here's two things that would be nice if we could make org.opencontainers.ref.names unique. This includes the multiple-platform case I mention above because it starts to get a bit hairy -- and honestly you should just be using application/vnd.oci.image.index.v1+json for that.

Sorry to stir this pot again, I've only just hit this issue trying to fix up the oci/cas implementation in umoci.

533

qianzhangxa commented 7 years ago

While I understand the case where you have multiple application/vnd.oci.image.manifest.v1+json descriptors with different platform entries (though I think that doesn't make sense in light of the fact that application/vnd.oci.image.index.v1+json has platform entries too).

Do we have platform in application/vnd.oci.image.index.v1+json itself? I think we only have it in application/vnd.oci.image.manifest.v1+json descriptor.

wking commented 7 years ago

On Wed, Feb 22, 2017 at 11:36:15PM -0800, Aleksa Sarai wrote:

While I understand the case where you have multiple application/vnd.oci.image.manifest.v1+json descriptors with different platform entries (though I think that doesn't make sense in light of the fact that application/vnd.oci.image.index.v1+json has platform entries too), it doesn't really make sense to me in the general case to allow org.opencontainers.ref.names to be duplicated in index.json.

In #533, @stevvooe and @vbatts explicitly discussed multiple entries with one name but different platform information 1, so you can have:

index.json → platform-appropriate-manifest

With unique names, you'd need:

index.json → multi-platform-index → platform-appropriate-manifest

Of course, without clear platform matching (or other metadata-based matching) to break ties, things get a bit awkward. Consumers putting a new platform-agnostic descriptor should have options for “clobber any pre-existing, platform-agnostic refs” or “add a new platform-agnostic ref, even if there is a pre-existing, platform-agnostic ref”. Consumers getting a descriptor can error out with “you asked for $NAME, but there are multiple descriptors matching that name with equivalently appropriate platform information”.

To @qianzhangxa's question, here's platform in the index JSON 2, so you can declare a platform for everything your reference from the index.

stevvooe commented 7 years ago

There should be no validation of annotation data.

Either way, the platform dispatch case is why you would want manifests with different names.

@cyphar I may be misunderstanding something. Could you provide an example of the data structures you think should be invalid? application/vnd.oci.image.manifest.v1+json should not have multiple manifests.

cyphar commented 7 years ago

@stevvooe Can you explain how a tool like umoci should handle the following index.json?

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 7143,
      "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f",
      "annotations": {
        "org.opencontainers.ref.name": "some-tag"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 8119,
      "digest": "sha256:b3d63d132d21c3ff4c35a061adf23cf43da8ae054247e32faa95494d904a007e",
      "annotations": {
        "org.opencontainers.ref.name": "some-tag"
      }
    }
  ]
}

When a user asks to do something with the some-tag tag?

Even if you say "well, that's not defined by the spec" then how is something like this meant to be handled -- which is something I'm quite concerned about because it starts combining the index.json layer with the ManifestList (now ImageIndex) layer -- combining tagging and multi-platform manifests:

{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.index.v1+json",
      "size": 7143,
      "digest": "sha256:0228f90e926ba6b96e4f39cf294b2586d38fbb5a1e385c05cd1ee40ea54fe7fd",
      "annotations": {
        "org.opencontainers.ref.name": "stable-release"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 8119,
      "digest": "sha256:b3d63d132d21c3ff4c35a061adf23cf43da8ae054247e32faa95494d904a007e",
      "platform": {
        "architecture": "ppc64le",
        "os": "linux"
      },
      "annotations": {
        "org.opencontainers.ref.name": "stable-release"
      }
    }
  ]
}

And even if you have multiple tags that are all unique platforms (once you parse the tree) -- you're now violating some of the layer separation -- how can you sanely separate the reference evaluation from the ManifestList/Manifest parsing?

qianzhangxa commented 7 years ago

What about we just remove image index from image layout and only have manifest descriptors in index.json? I think the main use case of image index is multi-platform support, but essentially index.json is an image index which already gives us the ability to support multi-platform, so I do not think two-levels image index is necessary. And then I think in an index.json, each org.opencontainers.ref.name + platform must be unique, if there is a manifest descriptor has no platform, it can be just ignored.

cyphar commented 7 years ago

@qianzhangxa Two levels of indirection give you many benefits -- the most obvious of which is that you can have multiple tags that reference the same multi-architecture image. And the multi-architecture image is part of the CAS, which gives you a lot of other benefits.

In my opinion, index.json should be as dumb of a store as is physically possible. While I understand the need for ManifestDescriptors in ManifestList, it doesn't make sense for index.json -- there's no one clear way for how multi-platform images should be implemented.

Not to mention that you've added all of this ambiguity and complicated logic to the reference resolver -- which doesn't make sense from my point of view. It seems to me to be quite a rampant layer violation, and from the umoci side it will make the code much more ugly and frustrating to implement (not to mention that the oci/cas interface and implementation will get pretty ugly too).

The current scheme means that GetReference and PutReference now need to pass a bunch of other information to the CAS store than just the "tag name".

qianzhangxa commented 7 years ago

@cyphar Can you please elaborate a bit more about "multiple tags that reference the same multi-architecture image."? Did you mean the case that two tags (e.g., latest and v2.0) reference the same image index?

cyphar commented 7 years ago

@qianzhangxa Sorry, I misspoke. What I meant was (and if you look at my example you'll see what I mean) was

the most obvious of which is that you can have multiple identical tags with the same name that reference different multi-architecture images.

Basically, the problem with the current state of org.opencontainers.ref.name is that there is no single unambiguous way of deciding what manifest is being referenced by a given tag. Which means that as it stands, tagging is effectively broken.

stevvooe commented 7 years ago

@cyphar In general, the proposal to validate the contents of annotation field is a non-starter. Annotations are functionally meaningless from the point of the view of the specification. It is up to the consumer decide how to present the data. If the tool doesn't reflect the properties of the data structure, then the tool is broken.

umoci can make its own determination about how to handle this case. One approach is to linearize the descriptors that match some-tag with an in-order traversal and then run the standard platform matching algorithm on that ordered set. On duplicates, you take the first entry that matches. Another approach would be to error out on duplicates.

You'll actually find that the algorithm proposed above is both sufficient and deterministic in resolving multi-platform images, even cases in where duplicate or ambiguous tags are present.

cyphar commented 7 years ago

@stevvooe But we do define what are valid values for annotations. Annotations in the org.opencontainers.* namespace are reserved by us and we have the right to specify how they should be used. Nothing is stopping a third party from making their own tagging system with annotations.

One approach is to linearize the descriptors that match some-tag with an in-order traversal and then run the standard platform matching algorithm on that ordered set.

What is the "standard platform matching algorithm" -- and how am I meant to run said algorithm if umoci is not running on the platform which I'm extracting (do I have to copy the entire contents of /proc/cpuinfo in order for it to be made clear)? The current scheme combines two very separate components of CAS operations -- reference resolution and blob/manifest parsing.

But all of this is besides the point -- why are we allowing multiple tags in this instance when you can just create a single tag which points to an ImageIndex that then shards up the multi-platform stuff. Is there some benefit I'm missing, because the current system strikes me to be significantly less elegant than if we'd just taken the refs directory and mapped it directly to json (as opposed to overloading the use of ManifestList to become ImageIndex).

You'll actually find that the algorithm proposed above is both sufficient and deterministic in resolving multi-platform images, even cases in where duplicate or ambiguous tags are present.

It might be both sufficient and deterministic, but that doesn't make it sensible when you could just drop this ambiguity in the first place.

Also, the algorithm you propose is not defined in the spec. So how is a user meant to know how their image will be understood by an image-spec implementation?

cyphar commented 7 years ago

And to make the point more clear, the current CAS interface for umoci is this:

// Engine is an interface that provides methods for accessing and modifying an
// OCI image, namely allowing access to reference descriptors and blobs.
type Engine interface {
    // PutReference adds a new reference descriptor blob to the image. This is
    // idempotent; a nil error means that "the descriptor is stored at NAME"
    // without implying "because of this PutReference() call". ErrClobber is
    // returned if there is already a descriptor stored at NAME, but does not
    // match the descriptor requested to be stored.
    PutReference(ctx context.Context, name string, descriptor ispec.Descriptor) (err error)

    // GetReference returns a reference from the image. Returns os.ErrNotExist
    // if the name was not found.
    GetReference(ctx context.Context, name string) (descriptor ispec.Descriptor, err error)

    // DeleteReference removes a reference from the image. This is idempotent;
    // a nil error means "the content is not in the store" without implying
    // "because of this DeleteReference() call".
    DeleteReference(ctx context.Context, name string) (err error)

    // ListReferences returns the set of reference names stored in the image.
    ListReferences(ctx context.Context) (names []string, err error)

    // ... other methods omitted for brevity ...

How do you propose that I change the ListReferences, PutReference, DeleteReference and GetReference interfaces to facilitate this ambiguity and platform matching code? I have no problem with switching away from Descriptors to ManifestDescriptors (though I also don't agree with the need for that change either, but let's just side-step that issue for now), but the required changes to now pass around some opaque concept of "what platform I'm using" doesn't make sense to me.

qianzhangxa commented 7 years ago

the most obvious of which is that you can have multiple identical tags with the same name that reference different multi-architecture images.

@cyphar But why is this a benefit? Won't cause ambiguity? E.g., if there are two same tags in the index.json reference two different image index, then for the runtime to pull the image based on that tag, which one should it pull?

cyphar commented 7 years ago

@qianzhangxa It isn't a benefit, my point is that the current system allows this and it shouldn't. That's what this issue is about, removing this ambiguity.

qianzhangxa commented 7 years ago

@cyphar I agree. I guess I misunderstood your previous comment.

Two levels of indirection give you many benefits -- the most obvious of which is that you can have multiple tags that reference the same multi-architecture image.

cyphar commented 7 years ago

@qianzhangxa I misspoke in that comment, and I corrected it here. Sorry for any confusion.

stevvooe commented 7 years ago

@cyphar

How do you propose that I change the ListReferences, PutReference, DeleteReference and GetReference interfaces to facilitate this ambiguity and platform matching code?

From what I said:

umoci can make its own determination about how to handle this case.

In general, it looks like the interface doesn't represent the underlying data structure. For your interface, it expects unique references. Upon encountering duplicates, it should error out. If you want it to be more flexible, return multiple descriptors for each name and don't error out. This is your choice as an application developer.

What is the "standard platform matching algorithm"[?]

It is somewhat implied by the given fields. Look at the platform. If you can process it or run it, do so. They are properties of the targeted thing that try to provide meaning for that thing without traversing further.

To be honest, I find this so straightforward that I am not understanding the confusion.

the required changes to now pass around some opaque concept of "what platform I'm using" doesn't make sense to me.

Where are you getting this requirement from? Collect the references and re-process them. Or, use the visitor pattern and pass in a function that does what you need.

But we do define what are valid values for annotations. Annotations in the org.opencontainers.* namespace are reserved by us and we have the right to specify how they should be used. Nothing is stopping a third party from making their own tagging system with annotations.

Avoid confounding the definition the contents of annotations vs the relationships of annotations. The relationships for the fields of a given type belong with the type, not field of a type, thus annotations' relationships cannot be defined by the annotation itself without creating serious issues.

It might be both sufficient and deterministic, but that doesn't make it sensible when you could just drop this ambiguity in the first place.

I think the problem here is that you have constructed an interface based on your expectation of how the data is structured. Now that you have studied the data structure, that interface seems to be insufficient (or, perhaps, may impose requirements). In general, I think this approach to a monolithic tag style interface is broken. You need to approach this more like a tree of references that you visit, collect and process.

I have no problem with switching away from Descriptors to ManifestDescriptors

This type split is indeed unfortunate. There were never meant to be two descriptor types (it would be like having 16-bit and 64-bit pointers in the same hardware) and complicates things. We could move the platform description to annotations to address this.

cyphar commented 7 years ago

@stevvooe

Where are you getting this requirement from? Collect the references and re-process them. Or, use the visitor pattern and pass in a function that does what you need.

My problem with this is that now you're effectively punting reference resolution to the caller of the CAS, because of this ambiguity and mixing of different restrictions of whether it is valid to return a particular descriptor (such as the platform). Which all means now that different implementations will have incompatible implementations of reference resolution and the original "feature" of having this free-for-all is not going to work out.

Specifications should not intentionally be introducing ambiguity like this, because it makes implementing something to match the specification difficult. "The algorithm is trivial" doesn't really justify the fact that currently there is no unambiguous standard-compliant way of parsing org.opencontainers.ref.name (without just asking the user which entry in the manifests array they want).

In general, I think this approach to a monolithic tag style interface is broken. You need to approach this more like a tree of references that you visit, collect and process.

As above, this effectively means that it is simply not possible for a user of a generic OCI library to be able to know what org.opencontainers.ref.name annotations will do or how they will be parsed. They have to implement all of the reference resolution themselves, because the library can't know what platform or whatever other decisions the caller wants to make.

So now a user has to go find out how skopeo, umoci, docker and whatever else implement the handling of references. They don't have any guarantees from the spec how the entrypoint (tag) to the image manifests is going to be parsed, and as a developer they have no information on how cases like the ones I've outlined above should be implemented either.

qianzhangxa commented 7 years ago

I agree with @cyphar, currently I am working on OCI image support in Mesos, I need to parse and pull OCI image, basically I am not sure what is the best/standard way to handle the case that there are duplicated org.opencontainers.ref.name in index.json, I think we need to make it clear in the spec.

stevvooe commented 7 years ago

@cyphar Calling this "punting" or "incompatible" is absolutely ridiculous. You are making up issues where there are none. It is not a free-for-all and the incompatibilities you are speaking to are based on assumption over fact. As I have said, the ability to have the same tag pointing to multiple things is a feature, not a bug.

Please avoid the histrionics and actually try the approach I suggested.

There are literally two choices here:

Either return multiple descriptors.
Error out on duplicates.

Have a little imagination and figure it out.

wking commented 7 years ago

On Tue, Feb 28, 2017 at 01:35:10PM -0800, Stephen Day wrote:

There are literally two choices here:

Either return multiple descriptors.

Error out on duplicates.

Have a little imagination and figure it out.

While this is fine for an API, @cyphar made a good point about compat issues 1 that you are not addressing here. Say @cyphar's imagination takes him down (1) for his index API and umoci, but the Docker devs' imagination takes them down (2). Now an innocent user builds an image with umoci that has multiple descriptors with the same org.opencontainers.ref.name and equivalent platform information in their index JSON. They feed that OCI-compliant image into Docker, and the OCI-compliant Docker ingestor chokes and dies on the repeated name. Who's fault is the breakage? The spec is not clear.

If multiple descriptors with the same name/platform are allowed, I think the spec should either:

a. Unambiguously require some MUST level support for them. b. SHOULD users away from using them, on the grounds that OCI-compliant handlers are not required to support them.

My current feeling is that (1) and (b) are the the conservative courses, but that isn't a healthy ecosystem for promoting the “same tag pointing to multiple things” feature. If you want this feature to be portable (and I don't have an opinion on that myself), I'd recommend the spec do something along the lines of (a). If you don't want the feature to be portable, then dropping the feature (like #582) makes the most sense to me.

crosbymichael commented 7 years ago

How is this even a concern of the spec? If ref name was important to the spec it wouldn't be in annotations. I thought across the board, annotations are opaque to the specs( runtime and image ) so why would you even encode anything in there for general consumption?

Whats the point in having a type safe spec and scheme if you are just going to add things in a generic object?

cyphar commented 7 years ago

@stevvooe I still don't see why this dereference walk:

ImageIndex[with same tag pointing to multiple manifests] -> Manifest -> ...

Is a good feature when this would be possible if refs.name was unqiue (and was possible pre-index.json):

ImageIndex[unique tag pointing to index] -> ImageIndex -> Manifest -> ...

Maybe you can explain to me what feature you get out of the first walk that you don't get in the second one? I'm sure there was a good reason for this, but

Please avoid the histrionics and actually try the approach I suggested.

I already have a branch in umoci (not pushed yet) with the approach you suggested. I have tried it. I don't agree with it, and that's why I'm trying to have a meaningful discussion on the issue.

Sure, I can implement whatever reference resolution algorithm I like and just force users to deal with it. But without also reading how skopeo does it, how docker is going to do it, and so on then I will be creating an incompatible UX. So we're going to have to come to some agreement between implementations anyway.

To be clear tools that I've seen (skopeo, umoci and the current state of docker/docker#26369) don't implement platform handling at all (effectively ImageManifest was something that we didn't touch) -- because we (@runc0m, myself and others) weren't entirely sure how things are meant to work in cases where you want to operate on an image that is different from the platform you're running on.

Now this concern we had is now extended into reference resolution, with the additional kick that now reference resolution also has to implement some form of platform handling in order for things to be done automatically -- or we have to (as you suggested) make some parts of the UX require the user to clarify what tag they want.

So again, the issue I have with your approach is that umoci (and the other tools I've mentioned) have to now either decide to just not implement this edge-case of the spec (which makes them non-compliant) or they have to come up with an out-of-spec way of requesting the user to clarify what tag they really meant.

It is not a free-for-all

I don't know what your definition of a "free-for-all" is, but in my book the following line from the spec indicates to me that it is indeed a free-for-all:

No semantic restriction is given for the "org.opencontainers.ref.name" annotation of descriptors.

cyphar commented 7 years ago

@crosbymichael

How is this even a concern of the spec? If ref name was important to the spec it wouldn't be in annotations.

While that's all well and good, users need to be able to reference things inside an OCI image. It's just silly to say that users are on their own if they want to be able to talk about what thing inside an OCI image they are referring to.

Whats the point in having a type safe spec and scheme if you are just going to add things in a generic object?

We have restrictions on other annotations (since they're JSON strings but the content is meant to be a different type). I'm not sure I understand the argument that we are not allowed to validate or otherwise touch annotations that we define in the spec.

crosbymichael commented 7 years ago

https://github.com/opencontainers/image-spec/blob/master/annotations.md

That makes it clear to me that they are arbitrary fields and values and its up to the consumers to handle the keys/values appropriately.

If there is a possibility that users can have duplicate values then just have the consuming code handle that and don't make a naive assumption that the values of the field will be unique. It only states that keys are unique. Starting to add all these rules around "arbitrary" values is a bad position to be in.

erikh commented 7 years ago

Maybe this seems like a dumb idea and I hope it's ok if I jump in, but what's the reason that the name couldn't be made first-class?

On Wed, Mar 1, 2017 at 10:56 AM, Michael Crosby notifications@github.com wrote:

https://github.com/opencontainers/image-spec/blob/master/annotations.md

That makes it clear to me that they are arbitrary fields and values and its up to the consumers to handle the keys/values appropriately.

If there is a possibility that users can have duplicate values then just have the consuming code handle that and don't make a naive assumption that the values of the field will be unique. It only states that keys are unique. Starting to add all these rules around "arbitrary" values is a bad position to be in.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencontainers/image-spec/issues/581#issuecomment-283434330, or mute the thread https://github.com/notifications/unsubscribe-auth/AABJ66f5vwhx3MS0HQVGsV9UXKq_7b6Lks5rhb9xgaJpZM4MJo3y .

stevvooe commented 7 years ago

Maybe this seems like a dumb idea and I hope it's ok if I jump in, but what's the reason that the name couldn't be made first-class?

This was a mistake made in Docker schema1 and we'd like to avoid repeating history.

This really isn't as complicated as @cyphar is making it out to be. The problem is that his code doesn't match the properties of the data structure. This problem is going to come up whether or not we make the ref.name unique. The correct solution is to make our programs model the problem appropriately.

stevvooe commented 7 years ago

Per discussion on the OCI call, we are going to follow this up with further implementation notes detailing the degrees of freedom in the datastructure and implications of UX design on compatibility between tools.

@cyphar Please open an issue, assigning it to me, for what you are looking for and we'll close this when we have agreed on the body of work.

lucab commented 7 years ago

Partially related, as per -rc5 (after https://github.com/opencontainers/image-spec/pull/561) the following are all valid refs:

"" (empty string in the value field, also different from a nil value or missing ref.name annotation)
../
<marquee>
$(foo)
' --
\n
:
an arbitrarily-long string

If an "implementation notes" doc is in the work, this may get some words covering the processing and compatibility side of using such things as an index.

AkihiroSuda commented 7 years ago

As of rc6, the correct annotation key is "org.opencontainers.image.ref.name"! https://github.com/opencontainers/image-spec/blob/v1.0.0-rc6/image-layout.md Anyone please fix the issue title so that people won't copy-paste by mistake? :sweat_smile:

vbatts commented 7 years ago

On this issue, I can not think of reason why one would want to have duplicate ref.name's, but I am no where close to convinced that they should be strictly unique.

Some of @lucab concerns have been addressed in https://github.com/opencontainers/image-spec/pull/671

I'm inclined to close this issue for now.

cyphar commented 7 years ago

but I am no where close to convinced that they should be strictly unique.

When talking to @stevvooe in person basically the main issue is that the restriction will make generation/modification of images more complicated. My counter-point to that is that not doing it makes consumption of images more complicated (and I find the latter issue to be more annoying because it requires more co-ordination between implementations than the former case).

I'm inclined to close this issue for now.

We still haven't got any normative language in the spec for how consumers should handle references. Even a paragraph about "when dereferencing things we recommend you do a pruned walk from the root to find all Manifests and then handle it that way".

stevvooe commented 7 years ago

"when dereferencing things we recommend you do a pruned walk from the root to find all Manifests and then handle it that way"

This is no longer the case. The validity of references in the image layout is restricted to index.json.

This really is a UX problem. In the same way that this particular annotation is not unique, other annotations may not be unique as well. Even if we make this one unique, you'll still have to provide ways to handle duplicates of other kinds of annotations. This is the fundamental problem with annotations. Any attribute used to select a particular component may not result in a unique selection.

Ultimately, a manifest index provides a list of descriptors. The UX needs to provide a way to display and select them.

Should we put something to that effect in the specification?

cyphar commented 7 years ago

This is no longer the case. The validity of references in the image layout is restricted to index.json.

Sorry, I meant "pruned from the index.json". Since ImageIndex can point to ImageIndex now you still end up having to do a full walk (though I don't think we have actually described what the semantics are if you have an ImageIndex with an entry that has a platform field which points to an ImageIndex that contradicts that field -- do you not see the lower level ImageIndex or should you do a full walk).

Should we put something to that affect in the specification?

Yes please.

qianzhangxa commented 7 years ago

I am currently working OCI image support in Mesos, basically what I am doing is a full walk (see the patch for details):

Go through each descriptor in index.json.
- Ignore the descriptor which does not have an annotation with the key org.opencontainers.image.ref.name or its value is not what user requests (the image tag), also ignore the descriptor whose media type is neither application/vnd.oci.image.index.v1+json nor application/vnd.oci.image.manifest.v1+json.
- If there are more that one image index or image manifests matching user's request, return an error.
- If there are no any image index or image manifest matching user's request, return an error.
At this point, we should have a matched image index or/and a matched image manifest.
- If there is a matched image index, go through each image manifest in it.
  - If there are more than one image manifests matching user's request, return an error.
  - If there is only one matched image manifest but there is also a matched image manifest found in index.json, return an error.
  - If there is no matched image manifest and there is no matched image manifest found in index.json too, return an error.
  - At this point, there must be only one matched image manifest (either from index.json or from the matched image index), use it.
- If there is a matched image manifest, just use it.

I think this is what I can do with the current state of the spec.

stevvooe commented 7 years ago

@qianzhangxa I am little confused as to why you have code that involves a remote URI and unpacks an image layout. That doesn't seem like a very efficient use case.

cyphar commented 7 years ago

@stevvooe

I am little confused as to why you have code that involves a remote URI and unpacks an image layout. That doesn't seem like a very efficient use case.

I assume because distribution is still not a solved problem in OCI land and it's a usecase people need, fetching indexes remotely is quite useful if you have enough metadata to figure out how to fetch individual blobs by digest. Personally I've started kicking around some ideas (based on ACI discovery) in https://github.com/cyphar/parcel.

cyphar commented 7 years ago

@qianzhangxa

Yeah, that algorithm looks about right. In umoci I'm going to try to make it possible for users to select from the matches (and the API is being changed to support returning lists of matched descriptors), but erroring out is totally fine (because not every tool can support multiple index options).

stevvooe commented 7 years ago

@cyphar I'm not really sure why anyone would want to depart from the existing registry model for centralized image storage, other than to create confusion and divide the market. OCI images map directly to the protocol with little effort. I could see some changes around how naming/tagging is handled but that is about it.

The problem with parcel (and other wku or dns-based approaches) is that it requires the user to run a service that has well-known urls, ending with many of same complaints that we have today. Such approaches tend to favor service providers and those who sell registry software or solutions over those running small-scale infrastructure.

Most of the success I have had in decoupling the protocol has been around opening up client-side configuration. Specifically, through namespace-configuration matching. This places image distribution in the hands of the users and operators, where it really belongs. Yes, this includes both delegating authority and location but it can all be done in the client.

I understand the goal of parcel, but the impetus starts with outright falsehoods:

The docker:// protocol and schema are not truly state-less HTTP, and therefore cannot be implemented by a "dumb" CDN. By necessity a stateful application must be run by a distributor, which is not always reasonable or possible. It also makes caching harder to implement for something like Varnish.

Untrue. You can distribute images with a dumb, static-file registry. Cases where this is not true should be considered bugs. We do this in production and it is fairly straightforward to setup.

The docker:// protocol is the only "official" way of distributing such images, which makes other methods of distribution (saving an image and then distributing it via FTP, BitTorrent, etc) out-of-band and not supported. While this extension does not require that all such methods be support, it elevates their usefulness by making them much more supportable.

Again, not entirely true and this is more of a product of the implementation than anything else. In fact, the protocol can support distributing blobs through bittorrent or other p2p means. I have demonstrated this with both bittorrent and a hand-rolled p2p protocol. The only reason that we haven't supported this is because docker does not store the artifacts unchange. This changes with containerd 1.0 and will be easier to implement.

Image "naming" and distribution are linked, tying the orthogonal issues of identity and source-of-files. This further complicates the jobs of CDNs, requiring them to provide DNS round-robin style distribution rather than GNU/Linux distribution "mirroring".

Again, this is more of a matter of implementation than a limitation of the protocol. By deferring to a client-side mapping of configuration to namespaces or authorities, this problem goes away.

While I understand that is easy to make conclusions based on existing implementations, it would be good to understand that actual problems before putting forward a proposal. Most of the problems are limitations of the existing implementation rather than the protocol itself. Adding a new protocol to the mix will be unlikely to help that situation.

cyphar commented 7 years ago

@stevvooe This is not the right place to have this discussion, I was just mentioning what might be a reason for pulling ImageIndexes. If you want to have the discussion about ideas like parcel, maybe we should have them somewhere else (and after 1.0.0). Since when has playing around with extensions to OCI been a bad thing -- and advocating such a mentality is incredibly harmful to the wider community.

I'm not really sure why anyone would want to depart from the existing registry model for centralized image storage, other than to create confusion and divide the market.

This is also known as "choice". parcel also allows you to map to a Docker registry by having a single static file in /.well-known/cyphar.opencontainers.parcel.v0 (with a docker:// template). Arguing that expanding protocol support is going to "divide the market", while ignoring that centralized image storage creates a monopoly is FUD IMO.

it would be good to understand that actual problems before putting forward a proposal.

It would also be good to not denigrate people for working on an idea as "not understanding the actual problems". Parcel is just an idea I've been working on, it's not a proposal (at least, not yet). If you disagree with the introduction, PRs are welcome...

stevvooe commented 7 years ago

@cyphar I think you misread my sentence and probably most of my response and I apologize for that. What I said was that for centralized image storage, the registry is a very good approach. This does not mean that you can't supplement it or distribute images in another way. In no way was I advocating for anything that would limit choice, other than ensuring that the choices available are good ones.

Let me rephrase the key part of my response:

The actual way forward here is a client-side configuration system that puts the choice in the hands of users and operators, rather than software vendors and service providers.

I want true choice. This means the ability to control naming authority and distribution, separately and local to the implementation. This means to be free of the desires of software vendors and service providers while not increasing the number of services you have to run to get things to work (ie wku or dns).

I am not sure how this position can be taken to be "supporting a monopoly". It is way less centralized than anything I've seen proposed.

Furthermore, no one is saying or has said that you can't play around with extensions to the OCI format. However, I do take issue with the propagation of FUD about the image distribution protocol. I would hope that debunking such FUD with actual facts would not be considered "denigrating".

cyphar commented 7 years ago

However, I do take issue with the propagation of FUD about the [Docker] image distribution protocol. I would hope that debunking such FUD with actual facts would not be considered "denigrating".

I am going to take a look at what you said and revise the introduction as necessary (I don't necessarily agree with everything you said, but I can ask you for clarification out-of-band). However, claiming that I "don't understand the actual problems" because you feel I misrepresented a project you work on is not debunking anything.

As for the rest of your message, we can discuss this another time / somewhere else. The tl;dr is that I think there are tradeoffs that need to be made and concerns I have that aren't solved (and are exacerbated) by having purely client-side configuration.

crosbymichael commented 7 years ago

Please keep the discussions technical and try not to mistake misunderstandings for anything with ill intent.

vbatts commented 7 years ago

ohman, y'all.

...

@qianzhangxa that looks about right.

opencontainers / image-spec

index.json: make org.opencontainers.image.ref.name unique? #581

533