xiekeyang / oci-discovery

Contain the OCI Ref-engine Discovery specification and related specifications as an extention to the image specification.

Other

2 stars 1 forks source link

Proposals of Policies of OCI Image Discovery #1

Closed xiekeyang closed 7 years ago

xiekeyang commented 7 years ago

[From Aleksa]: Explain of Policy based well-known URL for discovery

发件人: Aleksa Sarai [cyphar@cyphar.com] 发送时间: 2017年8月30日星期三 19:33 收件人: xiekeyang 主题: Re: 答复: Discovery of OCI image

The pre-email content is rough, and I’m improving it. Now I’ve some question about parcel want to discuss with you.

About discovery URI and distribution URI, are they same in your document?

No, they're not. They serve different purposes.

"Discovery URI" is something like opensuse.org/leap. "Distribution URI" is more like https://download.opensuse.org/some/project/some-manifest.json.

However, as I mentioned previously, I have a much nicer version of the document that uses "Template Descriptors". It might be possible for the Distribution URI to be removed without losing the functionality.

2.

You choose below in doc,

https:///.well-known/cyphar.opencontainers.parcel.v0.json

but not https://coreos.com/.well-known/abd-index/com.coreos.etcd

what’s the advantages of cyphar.opencontainers.parcel.v0.json? You drop URI path, is it OK and resolvable?

Parcel doesn't store any of the mappings or list of images in /.well-known/. There's only one JSON that tells the client what the policy is for the domain for any arbitrary image. That can then redirect the client to other JSON objects (the distribution objects) which then tell a user how to download the particular image. If the client cannot find a distribution object by following the JSON policy, then it breaks.

The reason why I didn't go with the same thing your proposal does is that your proposal makes the assumption that the publishing/distribution system is able to modify /.well-known/oci-index/... on the opensuse.org domain (for example). Parcel doesn't need this requirement, all of the redirection and templating is done on the client which means that there's no need to modify the /.well-known/ directory unless you're changing the image policy.

I will try to finish my rework of the parcel proposal next week.

-- Aleksa Sarai (cyphar) www.cyphar.com

xiekeyang commented 7 years ago

[From Trevor]: A proposal to use CAS and ref engine to describe discovered object

From: W. Trevor King [mailto:wking@tremily.us] Date: 2017年9月1日 6:42 To: xiekeyang CC: Aleksa Sarai Subjectiscovery of OCI image

On Thu, Aug 31, 2017 at 10:45:25AM +0000, xiekeyang wrote:

As to your review suggestion, I discuss with Aleksa…

It's probably time to move this into a public repo or PR somewhere. Developing a spec via email is going to get confusing ;).

On Thu, Aug 30, 2017 at 19:33, Aleksa Sarai wrote:

"Discovery URI" is something like opensuse.org/leap.

"Distribution URI" is more like

https://download.opensuse.org/some/project/some-manifest.json.

However, as I mentioned previously, I have a much nicer version of the document that uses "Template Descriptors". It might be possible for the Distribution URI to be removed without losing the functionality.

I think a discovery/distribution distinction is important to separate mutable references from CAS blobs. Aleksa has some motivation in 1, and the opening point in #11 is about delegating image hosting 2. The discovery resource is easy to host locally where clients can find it, and then you can point at any distribution resource you like (your own and/or several third parties). Distribution is via CAS and Merkle links, which is much easier to cache and verify, even over untrusted channels and from untrusted souces. Having some trust in the discovery provider helps avoid getting sent down a malicious Merkle tree (although with something like 3 you could protect against that too). So there are maybe three services in play:

The CAS engine which will eventually serve the image blobs 4.
A ref engine which resolves the name into an initial descriptor. For example, and index.json 5 and index 6 parser.
A way to find 1 and 2 starting from the user-supplied name.

Say example.com wants to delegate both 1 and 2 to a third party. Aleksa's discovery and distribution services [7,8] are something like that, although they don't let you delegate to the Docker regsitry API 9, etc. He assumes that at some point you'll be fetching OCI index JSON (indexuris), and that the blobs will be available via a blob URI (bloburis) 10. And his distribution object delegates both the ref engine and CAS engine at the same time.

I'd rather update our descriptor format to allow for CAS engine hinting. We currently have ‘urls’ 11, but that's a direct location for that blob. I want ‘casEngines’ or some such, which identifies CAS engines which likely hold that blob and its descendants. Something like:

  {
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "size": 7143,
    "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f",
    "platform": {
      "architecture": "amd64",
      "os": "linux"
    },
    "annotations": {
      "org.opencontainers.image.ref.name": "coreos.com/etcd:1.0.0",
    },
    "casEngines": [
      {
        "protocol": "docker",
        "uri": "https://quay.io/coreos/etcd:1.0.0"
      },
      {
        "protocol": "oci-template-v1",
        "uri": "https://cas.coreos.com/{algorithm}/{encoded}"
      },
      {
        "protocol": "oci-template-v1",
        "uri": "https://docker.com/cas/{algorithm}/{encoded:2}/{encoded}"
      }
    ]
  }

That doesn't assume anything about the CAS protocol, it just gives you enough information to fetch sha256:e69… and it's descendants from quay.io if you understand the Docker registry protocol 9, or from coreos.com or docker.com if you understand the oci-template-v1 protocol 4. You can also extend the casEngines entries if you need to supply additional data (e.g. auth credentials).

RFC 6570 doesn't seem to have a way to remove the leading two chacters 12, so the template forms are not flexible enough to handle sharding cleanly 13.

With casEngines, anyone providing a descriptor can help with discovering CAS engines.

Next you need a ref engine protocol. You can do something clever here, but I'm not really interested in the specifics. Minting a new protocol for “fetch an index 6 over HTTPS” lets Aleksa's undexuri be more generic, making room for other ref engine protocols in the future. Let's call that protocol oci-index-template-v1 and use URI templates again. There are many other ways you could handle this, just give those alternatives their own protocol name.

Going back to my service enumeration, the only thing we still need is a way to get from 3 (the name) to 2 (a ref engine, or set of ref engines). Let's call this ref-engine discovery. One approach to this would be to declare a well-known URI (https://{authority}/.well-known/oci/ref-engines) with content like:

  {
    "refEngines": [
      {
        "protocol": "oci-index-template-v1",
        "uri": "https://{authority}/ref/{name}"
      },
      {
        "protocol": "oci-index-template-v1",
        "uri": "https://oci.example.com/ref/{name}"
      }
    ],
    …
  }

then anyone that understood oci-index-template-v1 could resolve references to descriptors via those resources, check that descriptor's casEngines (if set) to locate the blobs, and connect to that CAS engine to retrieve the blobs.

And as Aleksa does with the discovery metadata 7, you can define a default ref-engine discovery value for folks who don't want to bother providing their own, although in that case they'd have to provide their own ref engine wherever the default ref-engine discovery content pointed. And folks who wanted to defer everything to a third party could point at their ref engine and either have that third party also run the CAS engine or delegate that to additional parties.

The reason why I didn't go with the same thing your proposal does is that your proposal makes the assumption that the publishing/distribution system is able to modify /.well-known/oci-index/... on the opensuse.org domain (for example).

Parcel doesn't need this requirement, all of the redirection and templating is done on the client which means that there's no need to modify the /.well-known/ directory unless you're changing the image policy.

This is a good point, and a good reason to not host the ref engine itself under .well-known (which is what I'd been suggesting in my earlier discussion with Keyang).

Cheers, Trevor

 Although I don't mean that layout format in particular, just
 anything that supports a content-addressable get like [9].

xiekeyang commented 7 years ago

[From Aleksa]: Options Discovery Mate Policy SHOULD be light-weight as ref-engine be insufficient

An image-spec PR until we have:

a) An implementation, with a spec. b) Evidence that the implementation is sane (in other words, that it is usable).

I also recommend referencing the AppC image spec when discussing discovery and distribution.

"Discovery URI" is something like opensuse.org/leap.

"Distribution URI" is more like

https://download.opensuse.org/some/project/some-manifest.json.

However, as I mentioned previously, I have a much nicer version of the document that uses "Template Descriptors". It might be possible for the Distribution URI to be removed without losing the functionality.

I think a discovery/distribution distinction is important to separate mutable references from CAS blobs. Aleksa has some motivation in [1], and the opening point in #11 is about delegating image hosting [2]. The discovery resource is easy to host locally where clients can find it, and then you can point at any distribution resource you like (your own and/or several third parties). Distribution is via CAS and Merkle links, which is much easier to cache and verify, even over untrusted channels and from untrusted souces. Having some trust in the discovery provider helps avoid getting sent down a malicious Merkle tree (although with something like [3] you could protect against that too). So there are maybe three services in play:

The CAS engine which will eventually serve the image blobs [4].

A ref engine which resolves the name into an initial descriptor. For example, and index.json [5] and index [6] parser.

A way to find 1 and 2 starting from the user-supplied name.

Say example.com wants to delegate both 1 and 2 to a third party. Aleksa's discovery and distribution services [7,8] are something like that, although they don't let you delegate to the Docker regsitry API [9], etc. He assumes that at some point you'll be fetching OCI index JSON (indexuris), and that the blobs will be available via a blob URI (bloburis) [10]. And his distribution object delegates both the ref engine and CAS engine at the same time.

I'm working on improving that. Delegating to a Docker Registry is definitely something that I intend to make sure is usable.

This is probably going to be part of my template descriptor improvements, where I'll change the "distribution object" to be able to have different backends (similar to what you're proposing, but rather than being part of descriptors it's all on the upper "distribution" layer).

I'd rather update our descriptor format to allow for CAS engine hinting. We currently have ‘urls’ [11], but that's a direct location for that blob. I want ‘casEngines’ or some such, which identifies CAS engines which likely hold that blob and its descendants. Something like:

I don't agree to be honest. There's a couple of reasons for this:

a) It feels like we're overloading what a descriptor is used for. Is it used during storage, or is it used as part of distribution? If both, does that mean you are going to store casEngines inside the CAS? Or is this a modified descriptor you only provide when you send it? b) It requires the thing that is generating the image to either be aware of where it's going to be hosted (bad) or it requires you to generate modified versions of the image descriptors after it's been created (possibly bad or disastrous, depending on whether the casEngines is expected to be stored in the image or just provided as a "fake" descriptor). c) This all sounds quite backwards to me. You should get the descriptors through distribution, not require the descriptors to define the distribution scheme. Or as in (a) is this just a way to avoid defining a separate construct for blob distribution?

There is a point to be made about having digests and sizes for distribution. I definitely think that the current incarnation of parcel suffers from requiring clients to "do the right thing" a bit too much (my current template descriptor WIP explicitly states that for "opaque" types you have to know the MIME and the digest before you pull it, and that you should verify it).

"casEngines": [
  {
    "protocol": "docker",
    "uri": "https://quay.io/coreos/etcd:1.0.0"
  },
  {
    "protocol": "oci-template-v1",
    "uri": "https://cas.coreos.com/{algorithm}/{encoded}"
  },
  {
    "protocol": "oci-template-v1",
    "uri": "https://docker.com/cas/{algorithm}/{encoded:2}/{encoded}"
  }
]

In parcel I favour using URI schemes over a separate protocol field, but I'll have to think a little more about it. In either case, at first glance you have to generate a lot of derivative objects in order to make it work. One of the main benefits of parcel is that there is no dynamic generation required for any of the "image policy" decisions for both discovery and distribution. That means that specifying where to download certain images from and so on is all implementable statically, purely through HTTP redirects and having static JSON blobs.

RFC 6570 doesn't seem to have a way to remove the leading two chacters [12], so the template forms are not flexible enough to handle sharding cleanly [13].

I don't think suffix slicing is required. While git does suffix slicing, other sharding stores (like camlistore from memory) don't. It's a personal taste thing.

With casEngines, anyone providing a descriptor can help with discovering CAS engines.

While I appreciate the "p2p" nature of this, I think a more fediverse style distribution scheme better describes how people want to download their software.

Next you need a ref engine protocol. You can do something clever here, but I'm not really interested in the specifics. Minting a new protocol for “fetch an index [6] over HTTPS” lets Aleksa's undexuri be more generic, making room for other ref engine protocols in the future. Let's call that protocol oci-index-template-v1 and use URI templates again. There are many other ways you could handle this, just give those alternatives their own protocol name.

Going back to my service enumeration, the only thing we still need is a way to get from 3 (the name) to 2 (a ref engine, or set of ref engines). Let's call this ref-engine discovery. One approach to this would be to declare a well-known URI (https://{authority}/.well-known/oci/ref-engines) with content like:

{ "refEngines": [ { "protocol": "oci-index-template-v1", "uri": "https://{authority}/ref/{name}" }, { "protocol": "oci-index-template-v1", "uri": "https://oci.example.com/ref/{name}" } ], … }

I feel like having a separate "refEngine" concept doesn't make much sense. What problem does it solve that you cannot solve by just defining the way you access the CAS (or the registry, if you prefer) through template descriptors? If you require having a "refEngines" thing, then why not just expand that concept to also handle "casEngines" (like parcel does). I appreciate wanting to separate the two, but it feels like modifying descriptors in order to do it is a bit backwards (you should get the descriptors through distribution, not require the descriptors to define the distribution scheme).

The reason why I didn't go with the same thing your proposal does is that your proposal makes the assumption that the publishing/distribution system is able to modify /.well-known/oci-index/... on the opensuse.org domain (for example).

Parcel doesn't need this requirement, all of the redirection and templating is done on the client which means that there's no need to modify the /.well-known/ directory unless you're changing the image policy.

This is a good point, and a good reason to not host the ref engine itself under .well-known (which is what I'd been suggesting in my earlier discussion with Keyang).

Yeah. It's important to note that any scheme for distribution that doesn't allow us (openSUSE) to distribute the blobs through download.opensuse.org, or delegate to somewhere else (potentially even doing both for different sets of images) is not really solving the problem of distribution sanely IMO.

-- Aleksa Sarai (cyphar) www.cyphar.com

wking commented 7 years ago

On Fri, Sep 01, 2017 at 01:05:19PM +1000, Aleksa Sarai wrote:

I'd rather update our descriptor format to allow for CAS engine hinting. We currently have urls, but that's a direct location for that blob. I want casEngines or some such, which identifies CAS engines which likely hold that blob and its descendants. Something like:

I don't agree to be honest. There's a couple of reasons for this:

a) It feels like we're overloading what a descriptor is used for.

I don't think casEngines is overloading it any more than urls is already overloading it; they are very similar information.

And since casEngines is just for that blob and its ancestors, you can use an oci-template-v1 entry with no template markup (just a regular URI) to get everything we currently get from urls for leaf blobs. That addresses the initial motivation for urls, because layers are always Merkle leaves (they provide no way to link further children).

Is it used during storage, or is it used as part of distribution? If both, does that mean you are going to store casEngines inside the CAS?

I expect it to be mostly used in ref engine responses (more on what I mean by ref engines here). I don't expect many casEngines entries on CAS blobs, but I have no problem with them being there. They could go stale (just like urls), but who cares? It's just a hint.

Or is this a modified descriptor you only provide when you send it?

That works for the root descriptor(s) in the ref engine response, but you can't mutate anything deeper than that without re-hashing the whole Merkle tree (and breaking any signatures on it unless you have the signing key). So deeper mutation would require cooperation between the ref and CAS engines and a local signing key. That might happen, but I don't expect it to be a frequent occurrence. However, a ref engine can dynamically populate casEngines in the root descriptor(s) it returns without touching the rest of the Merkle tree, so you can be a more helpful ref engine by mentioning CAS engines that you expect to hold most of the tree, and not mentioning CAS engines that do not.

b) It requires the thing that is generating the image to either be aware of where it's going to be hosted (bad)…

No, you can leave casEngines off of all the in-CAS descriptors if you like. The only case for populating them there is if you do have a resonable commitment for long-term hosting at a particular location (as we currently do for base Windows layers).

… or it requires you to generate modified versions of the image descriptors after it's been created (possibly bad or disastrous, depending on whether the casEngines is expected to be stored in the image or just provided as a "fake" descriptor).

The root descriptor(s) returned by the ref engine are outside of CAS. I don't think that makes them “fake”. And I don't see anything bad/disasterous happening if the ref engines adds additional metadata to them. As long as the mediaType, digest, and size are unchanged, you're still referencing the same Merkle tree. If you trust the ref engine, you'll trust those root-descriptor mutations. If you use something like opencontainers/image-spec#176 for in-CAS trust, the ref engine won't be able to alter any of that. Can you provide more detail on your bad/disasterous workflow?

c) This all sounds quite backwards to me. You should get the descriptors through distribution, not require the descriptors to define the distribution scheme. Or as in (a) is this just a way to avoid defining a separate construct for blob distribution?

It separates the name → root descriptor API (served by a ref engine) from the root descriptor → CAS engine connection API (just read casEngines on the root descriptor). That means you can go forth and define any number of ref engine APIs, and have a single library that takes the output of any of them, connects to a suggested CAS engine (or a local one, or whatever), and fetches the blobs while walking the Merkle tree. That also means that the ref-engine discovery service can completely ignore CAS engines, and delegate those suggestions to the ref engines and the root descriptors they provide. So you can say “use ref engine ${FOO} to lookup all my images” in your ref-engine discovery response, and not have to also say “and then fetch the images they tell you about from CAS engine ${BAR}”. But really, it doesn't matter where you get the CAS engine connection information from. If you want to (optionally) include that in the ref-engine discovery service (making it a ref/cas-engine discovery service), that would be fine too. The more CAS engine hints a client gets, the better. It can figure out which one it likes best, and ignore the rest unless it gets a CAS miss from it's favorite service.

xiekeyang commented 7 years ago

My early comments is a little jumble, I rearrange my questions like below:

@wking

I think a discovery/distribution distinction is important to separate mutable references from CAS blobs. Aleksa has some motivation in [1], and the opening point in #11 is about delegating image hosting [2]. The discovery resource is easy to host locally where clients can find it, and then you can point at any distribution resource you like (your own and/or several third parties).

If that means the images digest MUST be identical between different found distribution URLs? I've not much experience on publishing product. If software/binary publish and delegation usually do like this way? (I mean not the way that different URLs provide same software name with different content)

{ "mediaType": "application/vnd.oci.image.manifest.v1+json", "size": 7143, "digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f", "platform": { "architecture": "amd64", }

How to get this CAS blob descriptor during discovery process? If it need to retrieve the blob of the descriptor from all distribution urls? If it is, I think it should be work of distribution, not discovery.

So deeper mutation would require cooperation between the ref and CAS engines and a local signing key. That might happen, but I don't expect it to be a frequent occurrence.

I'm still a little unclear about the above scenario. Do you mean we actually only need casEngines for most cases?

"casEngines": [
  {
    "protocol": "docker",
    "uri": "https://quay.io/coreos/etcd:1.0.0"
  },
  {...}
]

Can protocol be removed, just one line like:

"urls": [
  "https://quay.io/coreos/etcd:1.0.0",
  "https://cas.coreos.com/{algorithm}/{encoded}",
  "https://docker.com/cas/{algorithm}/{encoded:2}/{encoded}"
]

And, about ABD metadata, I think it is a little incorrect. That one discovered object should have same mirrors.

@cyphar parcel makes easy to discovery process. But I still feel it is too simple by returning only distribution urls. It is likely feature of discovery of distribution urls, not discovery of images. As I think consumers (at least me) really want to get image basic information when discovering it, not only distribution urls.

wking commented 7 years ago

On Sun, Sep 03, 2017 at 01:50:22PM +0000, xiekeyang wrote:

That works for the root descriptor(s) in the ref engine response, but you can't mutate anything deeper than that without re-hashing the whole Merkle tree (and breaking any signatures on it unless you have the signing key). So deeper mutation would require cooperation between the ref and CAS engines and a local signing key. That might happen, but I don't expect it to be a frequent occurrence. However, a ref engine can dynamically populate casEngines in the root descriptor(s) it returns without touching the rest of the Merkle tree, so you can be a more helpful ref engine by mentioning CAS engines that you expect to hold most of the tree, and not mentioning CAS engines that do not.

@wking It seems your proposal might bring out benefits only on mutable images, and it make little sense on immutable images. I think OCI should be immutable stuff (at least so far), on discovery/distribution. If we consider mutation, we might fall into troubles on OCI policies, such as signature, version upgrade... We seems had better to limit the idea into immutable scope, Am I right?

I'm not clear on your “version upgrade” concern. But for signatures I don't see a problem. For example, if your Merkle tree looks like:

. a root descriptor mutated to populate casDescriptors -- b application/vnd.oci.image.signed.blob.v1+json (opencontainers/image-spec#176) |-- c application/pgp-signature (or whatever, opencontainers/image-spec#176) -- d application/vnd.oci.image.named.blob.v1+json (opencontainers/image-spec#176) -- e application/vnd.oci.image.manifest.v1+json |-- f application/vnd.oci.image.layer.v1.tar+gzip |-- g application/vnd.oci.image.layer.v1.tar+gzip -- h application/vnd.oci.image.config.v1+json

you still have a valid signed name assertion in b regardless of how you mutate a. And you can have many root descriptors (a, a', a", a‴, …) all pointing at the same b.

wking commented 7 years ago

On Mon, Sep 04, 2017 at 02:31:33AM -0700, xiekeyang wrote:

"casEngines": [ { "protocol": "docker", "uri": "https://quay.io/coreos/etcd:1.0.0" }, {...} ]


Can `casEngines` be removed? Meta just like,
```json
{
  "manifests": [
    "urls": [
      "https://quay.io/coreos/etcd:1.0.0",
      "https://cas.coreos.com/{algorithm}/{encoded}",
      "https://docker.com/cas/{algorithm}/{encoded:2}/{encoded}"
    ]

No, for two reasons.

‘urls’ is only for “this object” 1, while casEngines is for this object and its descendants 2. That distinction doesn't matter for Merkle leaves like layer tars, but it certainly matters for the root descriptor in a Merkle tree.
urls can be fetched directly, but casEngines may include a more complicated connection protocol or per-media type patterns. For example, with the Docker registry API, you'll be using location-addressed URIs for manifests 3:

https://quay.io/v2/coreos/etcd/manifests/1.0.0.

and quasi-content-addressed URIs for the descendant layers 4:

https://quay.io/v2/coreos/etcd/blobs/sha256/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

and for both of those you'll have to go through the Docker registry auth dance. Representing that with a bare https://… URI is not possible.

You could use a single docker://… or docker+https://… URI instead of the {"protocol": "…", "uri": "…"} object I proposed, but an object is easier to extend if you want to provide additional data for a custom protocol 2.

That means image provider MUST note urls on image. But they should do it, if they allows their image to be discovered.

No, both ‘urls’ and ‘casEngines’ should remain optional. Folks who want to hint at sources can use those properties to do so, but they shouldn't have to. You can always distribute potential source information out of band. For example, maybe you're using images in-house, and can guarantee that your company CAS engine will always have the blobs for your company images. Then there's no reason to set either ‘urls’ or ‘casEngines’ for those images; just configure your clients to pull from the company CAS engine. Setting those properties is just an in-band hint to consumers that don't have that outside information.

wking commented 7 years ago

On Tue, Sep 05, 2017 at 02:22:35AM +0000, xiekeyang wrote:

Discovery should also be able to use for local images.

You can always configure your clients to use a given ref engine for image names that don't match the DNS-named pattern (e.g. ‘coreos.com/etcd’ might be meaningfully resolved via dns, but ‘foo-bar’ probably won't). And you can configure your clients to use particular ref engine even for names that do match the DNS-named pattern (maybe you want to push all ref resolution through a company resolver). Neither of those mean that you can't define a DNS-based resolution protocol; they just mean that that DNS-based resolution protocol is not going to be the only approach. And that's fine.

I have a question about your suggested template yet:
{
  "refEngines": [
    {
      "protocol": "oci-index-template-v1",
      "uri": "https://{authority}/ref/{name}"
    },
    {...}
  ]
}
If protocol entry is necessary?

You could put that into your URI with oci-index-template-v1+https://{authority}/ref/{name} or leave a bare https://{authority}/ref/{name} and do some content negotiation dance to figure out the appropriate protocol, but I like having an object because:

It makes the protocol explicit, so clients that don't support that protocol can just skip the entry.
It's easier to extend with structured data for protocols that need more connection information 1.

I assume you refer to abd repository, which use pluginable implementation, and define io.abd.https-dns, io.abd.local, io.abd.nfs, etc. \ Sure, you could do that sort of thing. I don't see a need for the OCI to specify lots and lots of ref-engine protocols, but I want to support third parties who want to define their own protocols.

Can they be defined in one line of uri entry like:
{
  "uris": [
    "https://{authority}/ref/{name}",
  "ftp://ftp.xxx.org/pub/docs",
  "file://a:1234/b/c/d"
  ]
}

See my two reasons above for preferring objects to a single string.

I feel your protocol:uri is k/v based discovery, my above proposal is directly lookup by client. Which is better option?

I'm not suggesting a protocol:uri object. That would be:

"refEngines": { "oci-index-template-v1": "uri": "https://{authority}/ref/{name}", … }

and it only allows one entry for each protocol 2. I'm suggesting an array of objects:

{ "refEngines": [ { "protocol": "oci-index-template-v1", "uri": "https://{authority}/ref/{name}" }, … ] }

as far as client handling, that will be very similar to your ‘uris’ array, with the difference being how additional information is handled. And on that point, see my two reasons above for preferring objects to a single string.

 “The names within an object SHOULD be unique.”

wking commented 7 years ago

On Tue, Sep 05, 2017 at 03:58:23AM +0000, xiekeyang wrote:

More question:
The CAS engine which will eventually serve the image blobs [4].

A ref engine which resolves the name into an initial descriptor. For example, and index.json [5] and index [6] parser.
A way to find 1 and 2 starting from the user-supplied name.
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 7143,
"digest": "sha256:e692418e4cbaf90ca69d05a66403747baa33ee08806650b51fab815ad7fc331f",
"platform": {
 "architecture": "amd64",
It seems it MUST fetch the manifest blobs.

You don't have to fetch anything. If you don't like the answer you get from a particular service, you can always immediately stop processing. But in order to successfully unpack (or whatever) an OCI image, yes, you'll need to fetch the full Merkle tree.

Where they are stored? Does discovery server record them already?

The object you quote above is from 1. The tail of that object is:

"casEngines": [ { "protocol": "docker", "uri": "https://quay.io/coreos/etcd:1.0.0" }, { "protocol": "oci-template-v1", "uri": "https://cas.coreos.com/{algorithm}/{encoded}" }, { "protocol": "oci-template-v1", "uri": "https://docker.com/cas/{algorithm}/{encoded:2}/{encoded}" } ]

That means the ref engine which supplied the answer is suggesting those CAS engines as sources for the referenced manifest. The ref engine itself does not need to store any blobs, although of course, folks are free to write tools that fill multiple roles (e.g. serving as both a ref and CAS engine, or as a complete ref-engine-discovery, ref, and CAS engine. More on ref-engine discovery in 1).

Parcel discovery server record the policy only, and ABD store metadate like,
{"io.abd.metadata":
[
 {
  "name": "com.coreos.etcd",
  "labels": {
   "version": "1.0.0",
   "arch": "amd64",
   "os": "linux",
   "content-type": "application-binary/aci"
  },
That is mutable reference only. So it seems impossible for discovery server to return CAS blob, right?

I'm not clear on what you mean by “server”. Certainly you could have one Go program that implemented my proposed ref-engine discovery protocol, the oci-index-template-v1 ref protocol, and something compatible with the oci-template-v1 CAS protocol 1. And you could provide all those services from various locations under https://foo.example.com. Clients don't care how many programs are running to provide the services. And they don't care which hosts their running on (with the exception of DNS-based ref-engine discovery).

wking commented 7 years ago

On Tue, Sep 05, 2017 at 12:20:39AM -0700, xiekeyang wrote:

Till now I still feel ABD metadata format is nearest to what I admire. But I'm not sure how to generate and synchronize metadata, discovery server will have to fetch the latest infos from distribution urls frequently?

Right. The service backing https://etcd.coreos.com/.well-known/abd-index/com.coreos.etcd or https://coreos.com/.well-known/abd-index/com.coreos.etcd (for the io.abd.https-dns protocol 1) will have to be aware of new image name/versions being published so it can populate metadata["io.abd.metadata"][].name and metadata["io.abd.metadata"][].labels.version. It will also have to be aware of suggested hosting locations, so it can populate metadata["io.abd.metadata"][].mirrors.

@cyphar parcel makes easy to discovery process.

@cyphar's dicovery objects [2,3]:

$ curl https://{authority}/.well-known/cyphar.opencontainers.parcel.v0.json { "parcelVersion": "0.0.0", "disturi": { "template": "/{parcel.version}/{parcel.discovery.name}" } }

are simlar to my ref-engine discovery objects 3:

$ curl https://{authority}/.well-known/oci/ref-engines { "refEngines": [ { "protocol": "oci-index-template-v1", "uri": "https://{authority}/ref/{name}" }, { "protocol": "oci-index-template-v1", "uri": "https://oci.example.com/ref/{name}" } ], … }

The main differences being that the refEngines approach allows for:

Multiple entries. You can suggest several places to look for this image, not just one.
Structured protocol information. The next stage of image retrieval is not limited to a single protocol, unlike disturi which requires the next stage to follow @cyphar's distribution protocol 5.

Looking at @cyphar's distribution responses 6:

{ "parcelVersion": "0.0.0", "indexuris": [ "template": "/{parcel.version}/{parcel.discovery.name}", … ] "bloburis": { "template": "https://docker.com/cas/{parcel.fetch.blob.algorithm}/{parcel.fetch.blob.digest}", "template": "https://docker.com/cas/{parcel.fetch.blob.algorithm}/{parcel.fetch.blob.digest:2}/{parcel.fetch.blob.digest}" } }

The indexurls entry overlaps with the refEngines entry in my ref-engine discovery response but hard-codes the protocol, where similar refEngines entries would specify and explicit oci-index-template-v1 protocol 4.

The bloburis entry overlaps with the casEngines entry in my descriptors (e.g. ref engine responses) but hard-codes the protocol, where similar casEngines entries would specify an explicit oci-template-v1 protocol 4.

But I still feel it is too simple by returning only distribution urls? It is likely feature of discovery of distribution urls, not discovery of images.

The benefit to having the first stage know nothing about particular images is that there's no need to update it as you publish new images 7. That's especially useful since the https://{authority}/.well-known/… space may be maintained by folks who are only peripherally associated with the image-pushing devs. If you can hard-code the .well-known response and point somewhere else (at a ref engine for me, or at a distrubtion URI for @cyphar), then you can put the dynamic bits somewhere more convenient for the image-pushing devs.

cyphar commented 7 years ago

@wking I've said this earlier in the email chain: the current status of parcel is being reworked quite significantly. I have a wip branch that has some of the new "template descriptors" text which solves effectively all of the issues you just highlighted.

The next stage of image retrieval is not limited to a single protocol, unlike disturi which requires the next stage to follow @cyphar's distribution protocol [5].

That's not true. The appendix of parcel's spec explicitly states how you can specify alternative protocols, and supporting other protocols (as I've mentioned several times to you, Trevor) is an explicit design goal of parcel. Please stop repeating this incorrect statement.

Multiple entries. You can suggest several places to look for this image, not just one.

Template descriptors solve this problem, because they use an array. In fact, I might end up making it possible to also specify that certain mediatypes should be retrieved from one source but others from another.

cyphar commented 7 years ago

@xiekeyang I would recommend looking at https://github.com/appc/spec/blob/master/spec/discovery.md (which is the AppC ACI discovery). I'd have to ask @jonboulle, but given how ABD doesn't appear to be an alternative (or newer version) of ACI discovery.

xiekeyang commented 7 years ago

@cyphar

@xiekeyang I would recommend looking at https://github.com/appc/spec/blob/master/spec/discovery.md (which is the AppC ACI discovery). I'd have to ask @jonboulle, but given how ABD doesn't appear to be an alternative (or newer version) of ACI discovery.

Yeah, I've read both ABD and ACI discovery. Just feel ABD's meta is more likely fit to OCI.

the current status of parcel is being reworked quite significantly.

I've deeply read wip-rework branch of parcel. One question:

The defined namespaced variables of parcel.discovery.authority, userAuthority, name... They are definite URL fragment and query so on. Why you define them as variables, and parser them as query in explore.go? You might just need clarify in spec that:

URI template MUST need the canonical query of authority, userAuthority, name...
the example looks like:
https://{authority}/.well-known/cyphar.opencontainers.parcel.v0.json?authority=coreos.com&name=etcd

I guess you want implementation do more thing than above?

These variables are namespaced -- implementations MAY extend the following list of variables (and SHOULD also namespace their variables)

It is from parcel. Why they MUST be namespaced? They've already under fragment of cyphar.opencontainers.parcel.v0.json.

xiekeyang commented 7 years ago

It is from parcel. Why they MUST be namespaced? They've already under fragment of cyphar.opencontainers.parcel.v0.json.

Oh, I'm wrong. They should be for client tools arguments, and then be parser to generate exact URI in implementation. So, it CAN only be defined as variables in spec.

cyphar commented 7 years ago

@xiekeyang The reason for using URI templates, and having them as variables is so that it's possible to offload redirection to clients (making it possible to have a static webserver which might be updated out-of-band frequently). Serving results like this:

https://{authority}/.well-known/cyphar.opencontainers.parcel.v0.json?authority=coreos.com&name=etcd

Would not be reasonably possible with static pages (if I changed authority to something else you'd expect a different result -- how would you achieve that for every possible values of the URL arguments?). Also that example is not right -- the point of cyphar.opencontainers.parcel.v0.json is to provide the image policy for the domain. The lookup stage for a particular image is after that.

Quite importantly too, the .well-known directory must be treated as static IMO. Using it for anything other that describing domain policy is a misunderstanding of the RFC in my view (which I think @wking's original proposal did not correctly do with the oci-index thing).

But yes, you're completely right that I need more examples. The current document is quite confusing and I really do need to fix that (trust me, it's very high on my list of things to do, but I currently have some other higher priority tasks I'm working on).

wking commented 7 years ago

On Tue, Sep 05, 2017 at 11:49:52PM -0700, Aleksa Sarai wrote:

I have a wip branch that has some of the new "template descriptors" text which solves effectively all of the issues you just highlighted.

As you say, that branch is still WIP, and you're planning on making additional changes there as you have time. But at the moment, cyphar/parcel@3b7c84f1 is not clearly tying template descriptors 1 into your existing workflow. It seems to be a new stage between retrieving your discovery object and retrieving your distribution object 2, but whether it's an alternative response from the discovery URI 3 or associated with some new URI is not clear to me. Perhaps as you flesh the idea out it will become more clear to me how it supports delegation to multiple ref engines and provides for structured, extensible protocol objects 4.

The next stage of image retrieval is not limited to a single protocol, unlike disturi which requires the next stage to follow @cyphar's distribution protocol 5.

That's not true. The appendix of parcel's spec explicitly states how you can specify alternative protocols, and supporting other protocols (as I've mentioned several times to you, Trevor) is an explicit design goal of parcel. Please stop repeating this incorrect statement.

The text you have there says http(s) schemes refer to HTTP (over TLS) 5. It doesn't have anything to distinguish between, for example, the Docker registry API 6 and an oci-template-v1 API 4 when both of them are over HTTPS. Perhaps you intend to coin a docker+https scheme for the Docker registry API, and that would work, but as I pointed out in [7,8], a {"protocol": "…", "uri": "…"} object is to extend if you want to provide additional data for a custom protocol.

cyphar commented 7 years ago

Perhaps as you flesh the idea out it will become more clear to me how it supports delegation to multiple ref engines and provides for structured, extensible protocol objects [4].

I'm still trying to decide whether I can remove a stage from the current parcel spec, which is why it hasn't been updated in a while. But effectively my plan is to do the following:

Change the discovery object to have an array of template descriptors rather than disturi, which can point to different distribution objects (or more discovery objects, or another bare template descriptor). This is the level at which I think delegation to schemes like docker:// or ACI should be done.
Change the distribution object to have a single array of template descriptors (I'm still trying to decide if we need two) with the MIME type specifying if that endpoint only supports a particular object type or an arbitrary object type (I might make the MIME entry an array, I haven't decided yet).

I'm wondering whether there might be some nice way of doing further namespacing at the discovery layer (other than just forcing the client to try all of their options). But that's a future improvement I can work on once I have template descriptors in place.

Does that better help explain my current plans?

The text you have there says http(s) schemes refer to HTTP (over TLS) [5]. It doesn't have anything to distinguish between, for example, the Docker registry API [6] and an oci-template-v1 API [4] when both of them are over HTTPS. Perhaps you intend to coin a docker+https scheme for the Docker registry API, and that would work, but as I pointed out in [7,8], a {"protocol": "…", "uri": "…"} object is to extend if you want to provide additional data for a custom protocol.

I didn't mention Docker, but in my mind you would specify it with docker://. I'm not sure whether we should have separate objects as you mentioned, or something more like https+oci or w/e. But the fact that Docker uses HTTPS is not really very relevant, in the same way that it's not relevant that both FTP and HTTP use TCP.

wking commented 7 years ago

On Wed, Sep 06, 2017 at 04:39:36PM +0000, Aleksa Sarai wrote:

But effectively my plan is to do the following:

Change the discovery object to have an array of template descriptors rather than disturi…

I think that is an improvement, and it will address my multiple ref engines concern 1.

Change the distribution object to have a single array of template descriptors (I'm still trying to decide if we need two) with the MIME type specifying if that endpoint only supports a particular object type or an arbitrary object type (I might make the MIME entry an array, I haven't decided yet).

Squashing CAS and ref engines together in a single array and then differentiating with protocol strings doesn't seem like fundamental change. It's just a different structure for similar information. I think ‘protocol’ makes more sense than ‘mediaType’, because media types are about labeling content (e.g. HTTP responses) 2, but labeling a URI as “supports the Docker registry API” is declaring a supported protocol, not labeling content.

Does that better help explain my current plans?

I still think there are some gaps in your plan for an authority that wants to delegate to a Docker registry for both refs and CAS.

The text you have there says http(s) schemes refer to HTTP (over TLS) [5]. It doesn't have anything to distinguish between, for example, the Docker registry API [6] and an oci-template-v1 API [4] when both of them are over HTTPS. Perhaps you intend to coin a docker+https scheme for the Docker registry API, and that would work, but as I pointed out in [7,8], a {"protocol": "…", "uri": "…"} object is to extend if you want to provide additional data for a custom protocol.

I didn't mention Docker, but in my mind you would specify it with docker://.

And then assume it is always over HTTPS? Docker's registry API over HTTP is possible, although I expect most cases where you'll have it in a discovery response will want to use HTTPS.

I'm not sure whether we should have separate objects as you mentioned, or something more like https+oci or w/e.

Objects are easier to extend. For example, if you're using Docker's registry API, you may want to specify the location of the auth service (traditionally at auth.docker.io 3). With an object, you can do:

{ "protocol": "docker", "uri": "https://quay.io/coreos/etcd:1.0.0", "authUri": "https://auth.docker.io/token" }

With a single (template) URI entry, you'd have to stuff the auth URI into a query parameter or some such.

cyphar commented 7 years ago

@wking

Squashing CAS and ref engines together in a single array and then differentiating with protocol strings doesn't seem like fundamental change. It's just a different structure for similar information.

On the other hand, I don't see how your ref engines concept is actually adding needed features. It feels effectively the same as parcel's discovery objects to me, except that it doesn't seem to work with static files.

What do you think is the meaningful distinction between a CAS and a ref engine, now that OCI references don't exist anymore?

I still think there are some gaps in your plan for an authority that wants to delegate to a Docker registry for both refs and CAS.

It's not very clear to me that there aren't similar gaps in your scheme (not to mention it not being possible to use with static files). I can sort-of see how the "ref engine" would look for a Docker mirror, but how would the "mirrors" entries look in the descriptor for a particular object? You could go the layer violation route and just provide the URL directly, but the "official" interface to a Docker registry only provides you manifest granularity.

Objects are easier to extend.

Yeah, fair enough. Though your example doesn't sound quite right to me:

"uri": "https://quay.io/coreos/etcd:1.0.0",

wking commented 7 years ago

On Wed, Sep 06, 2017 at 05:42:05PM +0000, Aleksa Sarai wrote:

Squashing CAS and ref engines together in a single array and then differentiating with protocol strings doesn't seem like fundamental change. It's just a different structure for similar information.

On the other hand, I don't see how your ref engines concept is actually adding needed features. It feels effectively the same as parcel's discovery objects to me, except that it doesn't seem to work with static files.

What do you think is the meaningful distinction between a CAS and a ref engine, now that OCI references don't exist anymore?

CAS is a very specific thing 1. Clearly mapping a name to a descriptor (which is what a ref engine does, see 2) is not something a CAS engine can help you with.

There are many ways to map names to descriptors. One way is via the old image-spec refs directory 3. Another way is via the current image-spec index.json 4. Another way is via a service that stores (name, descriptor) tuples in SQL. Another way is via Docker's registry:

$ TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/debian:pull" | jq -r .token) $ curl -sH "Authorization: Bearer ${TOKEN}" -H 'Accept: application/vnd.docker.distribution.manifest.v2+json' https://index.docker.io/v2/library/debian/manifest/9.1 | jq . { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 1513, "digest": "sha256:a20fd0d59cf13f82535ccdda818d70b97ab043856e37a17029e32fc2252b8c56" }, "layers": [ { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 45142935, "digest": "sha256:06b22ddb19134ec8c42aaabd3e2e9f5b378e4e53da4a8960eaaaa86351190af3" } ] }

which jumps over an initial descriptor and returns the manifest blob as the Merkle root. And there are many other ways to perform this name → root descriptor/blob resolution besides those four.

Having ref engines as a concept for that initial root descriptor/blob lookup lets you separate that name-addressed resolution from the content-addressed resolution of a CAS engine.

I still think there are some gaps in your plan for an authority that wants to delegate to a Docker registry for both refs and CAS.

It's not very clear to me that there aren't similar gapss in your scheme (not to mention it not being possible to use with static files).

Fair enough. I think the way to figure out is to poke as many holes as you can, and I'll rework my proposal to patch them, or abandon it if the ship sinks under me ;).

On the static-files front, ref engines can have brains (although some protocols like oci-index-template-v1 5 won't need them). The think that needs to be static-compatible is the well-known URI. In my proposal, that's for ref-engine discovery 5, and you can have static content like:

{ "refEngines": [ { "protocol": "oci-index-template-v1", "uri": "https://{authority}/ref/{name}" }, { "protocol": "docker", "uri": "https://index.docker.io/v2", "authUri": "https://auth.docker.io/token", "authService": "registry.docker.io", } ] }

if you wanted to defer to both a set of index.json and the Docker registry (as a ref engine; the Docker registry currently implements both ref and CAS engines).

I can sort-of see how the "ref engine" would look for a Docker mirror, but how would the "mirrors" entries look in the descriptor for a particular object?

I'm not in favor of a ‘mirrors’ property. I'd expect a helpful ref engine to set casEngines in the root descriptor/blob it returns [5,6]. And deeper blobs in the Merkle tree could also give casEngines hints, although I'd expect that to be less common because of mutability issues 7.

You could go the layer violation route and just provide the URL directly, but the "official" interface to a Docker registry only provides you manifest granularity.

For thinks like the manifest I cURLed from Docker above, I'd expect Docker to have set a casEngines entry like:

"casEngines": [ { "protocol": "docker", "uri": "https://index.docker.io/v2", "authUri": "https://auth.docker.io/token", "authService": "registry.docker.io", "repository": "library/docker", }, … // other CAS engines, if Docker wanted to share the load ]

Though your example doesn't sound quite right to me:

"uri": "https://quay.io/coreos/etcd:1.0.0",

Yeah, I hadn't thought that through all the way. Does the casEngines example I gave earlier in this comment look better to you?

cyphar commented 7 years ago

CAS is a very specific thing [1]. Clearly mapping a name to a descriptor (which is what a ref engine does, see [2]) is not something a CAS engine can help you with.

I am aware what a CAS is, the main thrust of that comment was that OCI doesn't have references in the classic sense anymore so your core proposal of a ref-engine that provides mappings doesn't really work for a stock OCI image. And if you reduce it to "fetch the index and get the client to decide what to do" you've now recreated effectively how parcel does things, with some object changes but nothing "fundamental".

Having ref engines as a concept for that initial root descriptor/blob lookup lets you separate that name-addressed resolution from the content-addressed resolution of a CAS engine.

And my point was that I don't see how this is significantly different (or as you said "fundamentally different") to how distribution objects will work. That's effectively the same purpose that parcel's image policy serves, it's just that my scheme for distribution of OCI CAS blobs is described by part of the image policy. Distribution objects are separate from discovery objects (or if you prefer "ref engines", but I much prefer the ACI terminology). I understand that parcel needs some work to better explain how you could use it to delegate to docker.io (a usecase that is obviously important), but I don't understand how your proposal is "fundamentally different".

It feels like we're using different words to describe incredibly similar things.

I'm not in favor of a ‘mirrors’ property. I'd expect a helpful ref engine to set casEngines in the root descriptor/blob it returns [5,6].

Which is what I meant by your proposal not being static-files (or dumb publishing platforms) friendly. If casEngines cannot be set, how does your scheme work with that? parcel handles it by making that part of the "delegated" image policy (delegated in this context meaning through redirection).

wking commented 7 years ago

On Wed, Sep 06, 2017 at 10:44:05PM +0000, Aleksa Sarai wrote:

I am aware what a CAS is, the main thrust of that comment was that OCI doesn't have references in the classic sense anymore so your core proposal of a ref-engine that provides mappings doesn't really work for a stock OCI image. And if you reduce it to "fetch the index and get the client to decide what to do" you've now recreated effectively how parcel does things, with some object changes but nothing "fundamental".

Right. And that's what I do in #2 with index-template.md and its OCI Index Template Protocol. I think the difference vs. parcel's ref discovery is that I have an explicit oci-index-template-v1 protocol identifier (see ref-engine-protocols.md in #2) for that approach, while you're currently sticking https://… and otherwise entries in indexuris and requiring them to always return an index object 1.

Having ref engines as a concept for that initial root descriptor/blob lookup lets you separate that name-addressed resolution from the content-addressed resolution of a CAS engine.

And my point was that I don't see how this is significantly different (or as you said "fundamentally different") to how distribution objects will work. That's effectively the same purpose that parcel's image policy serves, it's just that my scheme for distribution of OCI CAS blobs is described by part of the image policy. Distribution objects are separate from discovery objects (or if you prefer "ref engines", but I much prefer the ACI terminology).

Your distribution URIs (which you recover via template expansion) currently give you only one option for ref → Merkle root resolution: you fetch the referenced URI and treat the response as an OCI index 1. If you replaced your indexuris entries with objects (which you may be open to based on your “Yeah, fair enough” 2) and add an explicit ‘protocol’ entry, you get my refEngines property 3. So they are very close.

I'm returning refEngines from the well-known URI. You're currently using the well-known URI for your discovery objects 4, but if you put an array of templates there instead 5, you'll be very close to having my refEngines behind a well-known URI.

I understand that parcel needs some work to better explain how you could use it to delegate to docker.io (a usecase that is obviously important), but I don't understand how your proposal is "fundamentally different".

My “fundamental change” comment 6 was saying that combining indexuris and bloburis into a single property with entries like 7:

{ "mediaType": "foo", "templates": […], }

didn't seem like a fundamental change. I wasn't claiming that my proposal (now in #2) was fundamentally different from yours. There are differences at the moment, but the gap is not large.

I'm not in favor of a ‘mirrors’ property. I'd expect a helpful ref engine to set casEngines in the root descriptor/blob it returns [5,6].

Which is what I meant by your proposal not being static-files (or dumb publishing platforms) friendly. If casEngines cannot be set, how does your scheme work with that? parcel handles it by making that part of the "delegated" image policy (delegated in this context meaning through redirection).

THere's no need to set casEngines in the well-known URI (which is used for ref-engine discovery). The ref-engines do need to set it, but:

Some ref engines (e.g. Docker's registry) are not dumb, and can set it on the fly.
Dumb ref engines (e.g. index JSON on a static Nginx server for my oci-index-template-v1 protocol) will need to be updated if they want to serve fresh casEngines data. But that's as late as possible in the image-name → Merkle root process; there's no way around having to update something by that point, and all the alternatives need something updated earlier in the chain. For example, in your current proposal, you'd need to update bloburis 8, which (in your proposal) is one step before the ref engine.

However, say you want to use oci-index-template-v1 for your ref engine, cannot update the index JSON you're pointing at, and want to provide your own casEngines that differ from the referenced index JSON (where they may be missing entirely). In that case, you need to promote casEngines up the chain and set it in your ref-engine discovery response (although I don't discuss this in #2 at the moment).

If both your well-known server (serving the ref-engine discovery response) and your oci-index-template-v1 ref-engine (serving index JSON) are dumb, you'd need to do one of the folllowing:

a. Add another (optional?) layer to the protocol (like your current discovery URI 4) whose sole purpose is redirection from a dumb ref-engine discovery service to a ref-engine disovery service with more brains (or at least access to an admin who can update the response).

b. Set casEngines in either the ref-engine discovery response and/or the oci-index-template-v1 ref-engine response and hope they stay current. If you don't change CAS engines too frequently (most people?), this is going to work fine.

c. Don't set casEngines at all, and leave it to consumers to figure it out on their own. This may be the best choice if you have no stable CAS engine you can refer them to.

If we expect a fair number of folks with dumb well-known ref-engine discovery servers, dumb oci-index-template-v1 ref-engine servers, and unstable CAS engine providers, then your (a) is the best choice. But it's not clear to me that the use case is popular enough to be worth addressing yet, and we can always mint a new well-known URI for indirection later (or provide a redirection media type which we'd return from the same well-known ref-engine discovery URI).

xiekeyang commented 7 years ago

@cyphar

Quite importantly too, the .well-known directory must be treated as static IMO. Using it for anything other that describing domain policy is a misunderstanding of the RFC in my view (which I think @wking's original proposal did not correctly do with the oci-index thing).

I partly agree with you, for I see a risk if we enforce implementation to use oci-index: The existing URI system service have rules like /.well-known/example.txt, and just want to add the implementation to support OCI image discovery.

For that it have to break their rule, to add /.well-known/oci-index/oci.txt. The character oci-index may be opaque and be unable to parser to its system. This will make the system recorded data hard to be managed.

However to the contrary, how to distinguish OCI image URI and other URI in one discovery service if not using oci-index or other sub-path? Only depend on *.txt or *.json segment?

xiekeyang commented 7 years ago

I think our different opinions might be accordance by trading off: which is the more popular approach for well-known URI system, @cyphar 's or @wking 's?

wking commented 7 years ago

On Thu, Sep 07, 2017 at 10:28:39AM +0000, xiekeyang wrote:

Quite importantly too, the .well-known directory must be treated as static IMO. Using it for anything other that describing domain policy is a misunderstanding of the RFC in my view (which I think @wking's original proposal did not correctly do with the oci-index thing).

I think my ref-engine discovery resources 1 are sufficiently compact to fall under “site-wide policy information and other metadata available directly (if sufficiently concise)”. It's certainly more concise than what ABD is storing under .well-known, which includes a list of images with image-version information 2.

But whatever. If folks feel it's still to big for .well-known, adding indirection like @cyphar's discovery object 3 is certainly possible.

I partly agree with you, for I see a risk if we enforce implementation to use oci-index: The existing URI system service have rules like /.well-known/example.txt, and just want to add the implementation to support OCI image discovery.

For that it have to break their rule, to add /.well-known/oci-index/oci.txt. The character oci-index may be opaque and be unable to parser to its system. This will make the system recorded data hard to be managed.

However to the contrary, how to distinguish OCI image URI and other URI in one discovery service if not using oci-index or other sub-path? Only depend on *.txt or *.json segment?

I'm not quite sure what you're getting at here. One issue is the use of a multi-segment path, since I used to have .well-known/oci/ref-engines in #2. I've just updated #2 to use .well-known/oci-ref-engines 4, which will allow us to register that protocol independently 5.

Or maybe you're concerned about the lack of an extention? I think that's fine, because I expect to be returning application/vnd.oci.ref-engines.v1+json in most cases (in the absence of user extentions and content negotiation) 6, so using an extention-based mechanism on the server to guess media types is not going to work anyway. And using an explicit extention in the well-known URI makes content negotiation for non-JSON responses strange (e.g. asking for application/xml from .well-known/oci-ref-engines.json does not seem intuitive).

wking commented 7 years ago

Should we close this issue in favor or more focused parcel and/or oci-discovery issues?