opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
835 stars 206 forks source link

[RFP] replace catalog API functionality #22

Closed vbatts closed 3 years ago

vbatts commented 6 years ago

for more rich indexing and searching of container images in a registry. There is the /v2/_catalog though it still seems not clear enough for implementers.

jonjohnsonjr commented 6 years ago

I'd like to remove /v2/_catalog from the spec entirely. It feels like an implementation detail that leaked into the docker spec, and it subverts the registry namespacing by being a global endpoint. My understanding is that it was an "admin API", which doesn't seem like it should be in scope of the distribution API, but a user-facing version of catalog would be great. ๐Ÿ‘

There are a couple relevant PRs in docker/distribution of the same vein that would be nice to include as well:

From the proposal, it's explicitly out of scope:

Managing the grouping of image repository names is considered part of distribution policy or content management, which are out of scope. For example, โ€œwhich image repositories are under library/?โ€ is out of scope for this project.

... but it would be very nice to have, eventually :)

In general, I'd like to get the spec into a minimal, workable state before we start adding any features.

jonjohnsonjr commented 6 years ago

With the caveat that I think all of this is out of scope for this project, I'd like to brain dump my thoughts on it so that we can maybe reach some consensus or have a plan for a proposal. Possibly, all of this could be a completely separate spec/service that many registry operators just happen to host side-by-side with their distribution-spec compliant registry. That said...

Most[citation needed] registries don't implement /v2/_catalog, so it definitely shouldn't be a requirement. While I'd personally like to see this removed from the spec, there are some existing clients that consume this endpoint (I know of only spinnaker, but there are likely others).

At the very least, we should mark this as OPTIONAL. (The rest of the spec would benefit from more formal Requirements Level language, too.)

Regardless of what we do here, it would be nice to have some method of indexing a registry that fits the spec's namespacing model. Being able to index the registry enables some nice projects, e.g. flagstate, grafeas.

I haven't put together a formal proposal for anything yet, but some prior art to get the ball rolling:

Listing Repostories

This + /tags/list/ would enable clients to index at least the tagged images in a registry.

Listing Images

There's currently a /tags/list endpoint for listing tags, but no way to list just manifests; thus no way to discover untagged images.

This + Listing Repositories + /tags/list would enable a client to index the entirety of a registry.

Pubsub

Given a point-in-time view of a registry, it's much more efficient to subscribe to a firehose of events than to constantly poll for changes. Many registries provide this feature. Unfortunately, none of these message formats seems compatible with each other. In an ideal world, we could standardize on some common format for registry events.

/cc @vbatts @dmcgowan

dmcgowan commented 6 years ago

I am +1 to not including the _catalog API at all. If we are going to have an OPTIONAL api, maybe we can have something that is more manageable, such as _list at any level, including /_list (lists first level namespaces if supported), /<repo>/tags/_list, <repo>/manifests/_list, and /<firstnamespacepart>/_list (lists repositories if supported). This is similar to the proposal you linked and we have more flexibility if we are not tied to _catalog.

jzelinskie commented 6 years ago

My collection of thoughts:

samuelkarp commented 6 years ago

Here's what we do for Amazon ECR:

jchesterpivotal commented 5 years ago

Expanding sideways, I have found myself wishing I could look up images based solely on their digest, similarly to how I can look up blobs.

This would be useful when distributing software into private registries. Right now it's often necessary to "relocate" the image by editing any references to that image to point to the private registry (eg, in a Kubernetes pod).

If I can refer solely by digest, I can ship the image without anyone needing to edit references to the image.

bsatlas commented 5 years ago

If we are okay with keeping catalog as an optional endpoint, I think issue can be closed.

stevvooe commented 5 years ago

I vote for dropping it.

bsatlas commented 5 years ago

I just submitted a PR to remove the catalog completely. Hopefully that helps move the decision along a bit.

https://github.com/opencontainers/distribution-spec/pull/45

bsatlas commented 5 years ago

I closed my PR since no one voted for dropping. This PR can be closed.

jzelinskie commented 5 years ago

I'm not sure why you say that no one voted for dropping it? It looks like a lot of people in this thread agree it should be dropped.

mikebrow commented 5 years ago

I'm fine with dropping the catalog api, so long as there is agreement on replacing it with a more useful list api, such as that which was discussed above.

bsatlas commented 5 years ago

@jzelinskie Sorry, I meant no one replied commented/LGTM/rejected my PR so I closed it. I figured I'd leave it up to you guys to make the change when yall were finished discussing.

bsatlas commented 5 years ago

@jzelinskie Sorry it sounds kind of rude. I just mean that I'm slowing down my contributions to this project and OCI in general because I feel like I'm being kinda annoying making PRs that no one told me to make and asking questions yall have already discussed years ago. I don't want to be "that guy" lol. I'll reopen the PR if there is still interest in fully dropping catalog though.

mikebrow commented 5 years ago

Folks get busy :) Thanks for the commits!

vbatts commented 5 years ago

@atlaskerr oh no Atlas! I think there is a disconnect. While there is history, your participation is good and valid. Sometimes old assumptions need to be challenged. I'm very glad for your PRs and commentary

bsatlas commented 5 years ago

Thanks guys. I guess I'm overreacting. The milestone for rc-1 is Feb 1 and there is so much housekeeping I wanted to get done before then and my anxiety is through the roof haha. I'll keep motivated!

jonjohnsonjr commented 5 years ago

I'll throw out two suggestions and people can pick apart why they hate it or love it. These are small, additive changes that shouldn't be too hard to get registries to adopt, and they fit the existing API model pretty well. If this is interesting to anyone, we could iterate on it in a more collaborative medium.

1. /v2/.../repositories/list

This would mirror /v2/.../tags/list

For listing repositories under library, a client might send this request:

GET https://registry-1.docker.io/v2/library/repositories/list?n=100

Receiving this response:

200 OK
Content-Type: application/json
Link: https://registry-1.docker.io/v2/library/repositories/list?n=100&last=postfixadmin; rel="next"
{
  "name": "library",
  "repositories": [
    "adminer",
    "aerospike",
    "alpine",
    "alt",
    "amazoncorretto",
    "amazonlinux",
    "arangodb",
    "backdrop",
    "bash",
    "bonita",
    "buildpack-deps",
    "busybox",
    "cassandra",
    "centos",
    "chronograf",
    "cirros",
    "clearlinux",
    "clefos",
    "clojure",
    "composer",
    "consul",
    "convertigo",
    "couchbase",
    "couchdb",
    "crate",
    "crux",
    "debian",
    "docker",
    "drupal",
    "eclipse-mosquitto",
    "eggdrop",
    "elasticsearch",
    "elixir",
    "erlang",
    "euleros",
    "express-gateway",
    "fedora",
    "flink",
    "fsharp",
    "gazebo",
    "gcc",
    "geonetwork",
    "ghost",
    "golang",
    "gradle",
    "groovy",
    "haproxy",
    "haskell",
    "haxe",
    "hello-seattle",
    "hello-world",
    "hola-mundo",
    "httpd",
    "hylang",
    "ibmjava",
    "influxdb",
    "irssi",
    "jetty",
    "joomla",
    "jruby",
    "julia",
    "kaazing-gateway",
    "kapacitor",
    "kibana",
    "known",
    "kong",
    "lightstreamer",
    "logstash",
    "mageia",
    "mariadb",
    "matomo",
    "maven",
    "mediawiki",
    "memcached",
    "mongo",
    "mongo-express",
    "mono",
    "mysql",
    "nats",
    "nats-streaming",
    "neo4j",
    "neurodebian",
    "nextcloud",
    "nginx",
    "node",
    "notary",
    "nuxeo",
    "odoo",
    "openjdk",
    "open-liberty",
    "opensuse",
    "oraclelinux",
    "orientdb",
    "percona",
    "perl",
    "photon",
    "php",
    "php-zendserver",
    "plone",
    "postfixadmin"
  ]
}

Based on that response, the client would follow up with:

GET https://registry-1.docker.io/v2/library/repositories/list?n=100&last=postfixadmin
200 OK
Content-Type: application/json
{
  "name": "library",
  "repositories": [
    "postgres",
    "pypy",
    "python",
    "rabbitmq",
    "rakudo-star",
    "rapidoid",
    "r-base",
    "redis",
    "redmine",
    "registry",
    "rethinkdb",
    "rocket.chat",
    "ros",
    "ruby",
    "rust",
    "sentry",
    "silverpeas",
    "sl",
    "solr",
    "sonarqube",
    "sourcemage",
    "spiped",
    "storm",
    "swarm",
    "swift",
    "swipl",
    "teamspeak",
    "telegraf",
    "thrift",
    "tomcat",
    "tomee",
    "traefik",
    "ubuntu",
    "vault",
    "websphere-liberty",
    "wordpress",
    "xwiki",
    "yourls",
    "znc",
    "zookeeper"
  ]
}

I believe this is a good replacement for /v2/_catalog, since it works within the repo model. I think a top-level /v2/repositories/list request would make sense for certain registries but not for others. That would return e.g. only public repositories for dockerhub. GCR would probably reject it.

GET https://registry-1.docker.io/v2/repositories/list?n=100
200 OK
Content-Type: application/json
{
  "name": "",
  "repositories": [
    "library"
  ]
}

You can imagine all public repos being returned instead of just the official images, but that seems like something we'd want the registry operators to control since it would depend on the auth and namespace model.

2. returning a list of descriptors somewhere

I commented here that it would be cool if we extended /v2/.../tags/list to include a list of manifest descriptors contained by that repository.

That might be a bad idea for various reasons I hadn't considered, but we could also add a new endpoint, e.g. /v2/.../descriptors/list or something. Tags could be represented by annotations so that the return type would be a valid image index. One cool thing about this is that you'd be able to just "pull" the entire repository without much work. Another benefit is that we can reuse data structures for consuming this endpoint, but we don't have to force registries to recompute the hash on this faux image index, so pushing remains a cheap operation. (Recomputing the digest of the whole repo would be expensive for large repos, especially if we wanted to represent repos recursively with this method...)

For example:

GET https://registry-1.docker.io/v2/library/ubuntu/descriptors/list
{
  "schemaVersion": 2,
   "manifests": [{
      "digest": "sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210",       
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "size": 2035,
      "annotations": {
         "org.opencontainers.image.ref.name": "latest"
      }
   }]
}

Some drawbacks: pagination with this faux image index representation is kind of clunky, though we could do it in a similar way to tags/repositories...

vsoch commented 5 years ago

For the repositories endpoint, I agree that it would need to be registry specific. Some might prefer to return all public, others that the specific user making the query is allowed to see.

Just to clarify - the repositories endpoint handles being given a particular organization (e.g., library) and then return the repos under it, or no organization (https://registry-1.docker.io/v2/repositories/list?n=100) and then returns the organizations / collections instead? Sounds like a scraper's dream :) What about more weird namespaces like <organization>/<one>/<two> - I remember one registry running into issues with Singularity because they had this more non-traditional pattern.

A concern with this endpoint is that it gives good reason to stress the API - people like myself that like to study software are going to scrape the heck out of it. I would say for larger servers that want to serve the endpoint and not be scraped, they could do something along the lines of what GitHub / StackOverflow does, and provide some BigQuery Table of data.

Would there be a next / previous in the responses (i.e. are they paginated?) Also,would it make sense to return a random order in case people do massive scraping, we don't all hit poor postgres at the top at the same time?

I missed the descriptors discussion - what is a descriptor, an image manifest with annotations?

Stepping out of details for a second - what are the goals of this endpoint? From a high level, it lets people interested in studying containers (via their manifests) find them more programatically. What else?

jonjohnsonjr commented 5 years ago

Just to clarify - the repositories endpoint handles being given a particular organization (e.g., library) and then return the repos under it, or no organization (https://registry-1.docker.io/v2/repositories/list?n=100) and then returns the organizations / collections instead? Sounds like a scraper's dream :) What about more weird namespaces like // - I remember one registry running into issues with Singularity because they had this more non-traditional pattern.

Yes exactly, since repositories can be nested, you would be able to walk the repositories down to leaves. This is how GCR works today, but if you grafted both of these proposals on to the /tags/list/ response.

A concern with this endpoint is that it gives good reason to stress the API

In my experience, people are already stressing the API with /v2/_catalog, especially spinnaker :) this would allow a more targeted scraping, e.g. if I only care about library/ubuntu I can scrape that, instead of using /v2/_catalog and going from there.

Would there be a next / previous in the responses (i.e. are they paginated?)

Yes, see the Link header, this is similar to how /tags/list works now. In one example, there is a link included in the body, but that's not consistent in the spec, but I think the header is the canonical way to handle pagination here.

I missed the descriptors discussion

A descriptor is defined here. tl;dr, it's: digest, mediaType, size, urls, and annotations. These are used to describe content-addressable content so that clients can handle them as something other than an opaque blob. Manifest layers is a list of descriptors, and image index manfiests is a list of descriptors. Basically all the json structures are compositions of various descriptiors with extra fields for manifests and image indexes.

what are the goals of this endpoint

Exposing something like this solves two problems:

  1. Discovery of images that aren't tagged (e.g. old images) and their digests.
  2. Getting an index of a repository without making N + 1 requests (list tags + pull every tag).

There's not currently any way to ask the registry "tell me about everything in this repo", which would be solved by using both of these endpoints together.

vbatts commented 5 years ago

On 07/03/19 22:40 +0000, jonjohnsonjr wrote:

1. /v2/.../repositories/list

This would mirror /v2/.../tags/list

I think this is a nice logical extension.

Further, for a provided security token context, a way to list orgs you have access too?

vbatts commented 5 years ago

what are the goals of this endpoint

Exposing something like this solves two problems:

  1. Discovery of images that aren't tagged (e.g. old images) and their digests.
  2. Getting an index of a repository without making N + 1 requests (list tags + pull every tag).

There's not currently any way to ask the registry "tell me about everything in this repo", which would be solved by using both of these endpoints together.

:+1:

jonjohnsonjr commented 5 years ago

Do we define the token handshake anywhere in the spec or is that out of scope for the distribution spec?

Further, for a provided security token context, a way to list orgs you have access too?

Not sure what "orgs" would be, but if it's just the top-level repos, we could reuse this via: https://registry-1.docker.io/v2/repositories/list

For the actual scopes, I would imagine something like:

https://auth.docker.io/token?service=registry.docker.io&scope=repository:*:list

https://auth.docker.io/token?service=registry.docker.io&scope=repository:library:list

https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/ubuntu:list

This might be complicating things more but I'm trying to reuse existing patterns wherever we can.

sajayantony commented 5 years ago

Adding to this list -

Listing & Auth

PubSub

With the Artifact extension we are considering adding the config descriptor type as well to the payload so that consumers can filter events by artifact type. A helm chart update would be a webhook filtered by the helm type.

yuwaMSFT2 commented 5 years ago

Some comments:

  1. _catalog API is actually being used more than many assume. For ACR we provided this API to be compatible with dockerhub; from our data, the usage is significant (but maybe because there is no good alternatives). Meanwhile when we first introduced MCR for Microsoft public images, we blocked the _catalog API; but later we got multiple customer requests to implement it since many depend on it. It's not so easy to just retire this API.

  2. I saw one previous comment to add /_list API at different levels. I feel this is good. It provides a unified way to provide list functionality: {repositoryname}/_list, {repositoryname}/manifests/_list, {repositoryname}/tags/_list. Actually it's just to add list API for each supported type of entities. Currently registry has 3 level of entities: repository/manifests/tags; in the future we may find more. To provide a unified way to do it is important. Also the underscore "" kind of make sure it doesn't break any existing scenarios (/repositories/list may not work for some registries, since a repository name can be "repository/list" in certain scenarios)

  3. The spec is only to define the contract. Whether it is performance or not is an implementation concern.

  4. Regarding auth token scope, it depends on if we want to separate the list capability from the pull capability. Current registry implementation mostly assume them to require the same capability, which may make sense (user needs to list repository and then do pull).

Actually these questions were raised at the beginning but the general consensus was that these were not in the scope of distribution spec. But entity list/search is an essential part of a practical registry provider. So it sounds good to have it either in distribution spec, or some additional spec (management spec just a wild guess?)

As a data point, ACR implements both the _catalog API (to be compatible with OSS docker registry), and the private set of _list API for each type of entity (repositories/manifests/tags).

jonjohnsonjr commented 5 years ago

For ACR we provided this API to be compatible with dockerhub

Does dockerhub support catalog? Last I checked, it didn't.

the usage is significant (but maybe because there is no good alternatives)

This has been my observation as well.

yuwaMSFT2 commented 5 years ago

@jonjohnsonjr You are right! I must be thinking about something else:) Maybe the search API on Dockerhub:) Updated my comment. Thanks!

bsatlas commented 5 years ago

I'd personally like to see a repository listing format like this:

{
  "repositories": [
    {
      "name": "app-foo",
      "namespace": "namespace-foo",
      "project": "project-foo",
      "package": {
        "documentation": "https://github.com/opencontainers/image-spec",
        "icon": "https://opencontainers/static/icon-small.png",
        "type": "oci-image-v1.0.0"
      },
      "labels": {
        "consumer": {
          "costCenter": "cs-foo",
          "manager": "Tom Ripen",
          "team": "team-foo"
        },
        "provider": {
          "awsAccount": "aws-account-foo",
          "pricingPlan": "PREMIUM",
          "region": "us-east-1"
        }
      }
    },
    {
      "name": "app-bar",
      "namespace": "namespace-foo",
      "project": "project-foo",
      "package": {
        "documentation": "https://github.com/opencontainers/image-spec",
        "icon": "https://opencontainers/static/icon-small.png",
        "type": "docker-image-v2.2.0"
      },
      "labels": {
        "consumer": {
          "costCenter": "cs-foo",
          "manager": "Tom Ripen",
          "team": "team-bar"
        },
        "provider": {
          "awsAccount": "aws-account-foo",
          "pricingPlan": "PREMIUM",
          "region": "us-east-1"
        }
      }
    },
    {
      "name": "app-foo-helm",
      "namespace": "namespace-foo",
      "project": "project-foo",
      "package": {
        "documentation": "https://helm.io/docs/packaging",
        "icon": "https://helm.io/static/icon-small.png",
        "type": "helm-chart-v1.0.0"
      },
      "labels": {
        "consumer": {
          "costCenter": "cs-foo",
          "manager": "Tom Ripen",
          "team": "team-helm"
        },
        "provider": {
          "awsAccount": "aws-account-foo",
          "pricingPlan": "PREMIUM",
          "region": "us-east-1"
        }
      }
    }
  ]
}

Fetching repository names exclusively is rather limiting. Repositories is a great place to include package metadata for custom artifact types.

yuwaMSFT2 commented 5 years ago

@atlaskerr Current spec doesn't provide a way to annotate at repository level; all the metadata are associated with individual manifest.

bsatlas commented 5 years ago

@yuwaMSFT2 I think that's the problem. I like what you said earlier about an additional management spec:

Actually these questions were raised at the beginning but the general consensus was that these were not in the scope of distribution spec. But entity list/search is an essential part of a practical registry provider. So it sounds good to have it either in distribution spec, or some additional spec (management spec just a wild guess?)

SteveLasker commented 5 years ago

To attempt to summarize the discussion above: There are two APIs I believe we're discussing.

With various requirements, such as only returning the list of repos and tags the user has access to.

Use Cases

Auth requirements

Users should only be able to see the repos and tags they have permission to. These include anonymous and RBAC scenarios. Since each cloud vendor has their own auth flows, I don't think it's reasonable to assume we can achieve a common auth flow. (As much as I wish we could, I'm just being a bit more pragmatic) I'd suggest the spec should simply specify the listing MUST adhere to listing only repos and tags the user has access to. Simply listing a repo the user doesn't have access to can disclose information. As @yuwaMSFT2 mentioned above, each cloud vendor also implements unique roles. Within ACR, we support push, pull and separate listing roles to avoid leaking information.

Repo listing

Tag listing

Performance

One should assume that any supported API will be abused. Whether the implementer decides to cache and support massive requests, or throttle seems like an implementer/vendor specific decision.

Details API

I like where @atlaskerr is going with the meta-data details. I struggle on whether this should be in the initial repo listing, or subsequent individual requests of each repo/tag. Although, queries that support repo/tag listing with basic filters is important to be useful.

Query semantics vs. basic filters

Having worked through OData and other query tools, I worry about how much burden we put on the user for the most basic scenarios. I'd hope we could construct a REST based API that had progressive disclosure of the complexity. A basic repo listing has default behavior of the top 100 repos, alphabetically listed. /v2/<ns1>/<ns2>/repositories/list for more complex /v2/<ns1>/<ns2>/repositories/list?orderBy=createdDate&order=desc The spec could say certain parameters are required, while registries could support additional parameters providing their unique values /v2/<ns1>/<ns2>/repositories/list?orderBy=createdDate&order=desc&registrySpecific=foo

Eventing and Listing

Having an evening API is important to complete the scenario. The vulnerability scanners currently implement time based scheduling to attempt to keep up to date. However, each have asked for an eventing API to keep current. While this is likely another portion of the spec, having them keep in sync to provide a common experience would be helpful. It would also alleviate undue stress on one api vs. another to cover the scenarios as it allows tooling to use each for their value.

CLI

Another logical extension is a common CLI for registries. While we're discussing common REST APIs across registries, one of the big benefits is using a common CLI across registries. We all benefit from docker pull across each. Having an foo repo list api would be an interesting project. Starting with a common REST api could incubate some interesting innovations.

Next Steps

Do we have enough captured here to start a draft multi-page spec that we could put in a sub folder of distribution?

vsoch commented 5 years ago

Is it just me, or does this smell a little bit like GitHub or GitLab API endpoints? For example, listing repositories:

# Github
GET /users/:username/repos

# Proposal
GET /v2/<ns1>/<ns2>/repositories/list

The main difference is just the use of "repos" vs "repositories" and the "list" is implied in the first. The maps to the users/:username.

It's similar to how (some / all of?) Docker's APIs were integrated into the image spec, no? Or more simply, wouldn't it be really powerful if we developed a spec for these additional endpoints so that already existing version control APIs would already be compliant? In the context of Github, this would mean that GitHub pages could serve a static registry and deliver the same interactions as with a container registry. If we add content types, then with a "doc" or "license" sort of type, this would link cleanly to the files in the repo.

yuwaMSFT2 commented 5 years ago

The first (/repos) is a more common RESTful style API. While the second one (/repositories/list) may not work in certain cases depending on how other APIs are designed (what if there is an existing repository called list?)

I would vote for the first:)

jonjohnsonjr commented 5 years ago

@SteveLasker that seems like an astonishing amount of scope creep for a summary ๐Ÿ˜‰

My main goal here is to drop /v2/_catalog and propose a palatable replacement for it that enables other systems to index the registry and provide search capabilities. As it stands, the registry is somewhat opaque in that you cannot list images that are not tagged. I'm not sure if that was intentional in its design.

If we can get:

  1. a registry that is fully indexable via the registry API, and
  2. a standard event payload (something something cloud events?),

then we could build most of what you're proposing around that, generically. I'm hesitant to add a ton of requirements to the registry spec because that will basically guarantee that most registries won't ever fully implement it.

Some deployment tools look for the "newest" tag, and don't want to use a :latest tag metaphor. Being able to query the list of tags, in a given order, with a top capability allows them to get the newest tag for a given repo.

That's horrifying.

Paging and top as customers that automate builds can have thousands of tags

Pagination for tag listing is already in the spec. Something like top seems useful, but I don't know if we want to include it in the spec.

Having an evening API is important to complete the scenario

Agreed. Ideally, well-behaved clients would do a full-resync once and listen for registry events to keep their index up to date. This is similar to how kubernetes informers behave.

CLI

There are a few registry CLIs already. I don't think we need to create an OCI-blessed CLI, but it should be easy to write a CLI from reading the spec.

Is it just me, or does this smell a little bit like GitHub or GitLab API endpoints?

I based it on the /tags/list endpoint already in the registry. I'm not tied down to any particular format, but I'd like to be at least self-consistent in the API.

this would mean that GitHub pages could serve a static registry and deliver the same interactions as with a container registry

I think this is going to be hard to achieve and maintain, since GitHub is free to change their API arbitrarily... so it might not be a great idea to tie the spec to whatever GitHub's API happens to be right now. (I love that you got the static registry stuff working, BTW.) What does the equivalent GitLab API look like? The same?

ad-m commented 5 years ago

GitHub pages could serve a static registry and deliver the same interactions as with a container registry

I think limiting yourself to GitHub Pages is not the right solution. First of all, there is a proprietary solution. Second, its usage limits (maximum size 1 Gb, monthly transfer of 100 GB, 10 updates per hour) can limit practical potential.

We can have statistically generated registers in mind and I like this idea. I notice that in the case of operating system repositories, for example, APT is not uncommon, they are statistically generated (see https://github.com/krobertson/deb-s3 for apt-repository on s3, https://tylerpower.io/post/hosting-yum-repo-on-s3/ for yum-repository on s3), and updates require refreshing of register indexes. After all, the repository reads more than writes to it, so the read operation should be optimized.

Thanks to the appropriate architecture in this area, operating system repositories have many mirrors ( https://www.debian.org/mirror/list ), and now - in the case of Docker - an unofficial mirror of an unofficial repository is something limited (https://docs.docker.com/registry/recipes/mirror/). I would like to draw attention to the arguments that were given in the case of abandoning one of the Linux kernel distribution protocols.

vsoch commented 5 years ago

I donโ€™t concretely mean that it would be limited to GitHub Pages, the idea that Iโ€™m trying to get across is that there are already APIs that exist to list repositories and projects. Instead of coming up with an entirely new one, we can use features from those APIs that have already been somewhat tested and known. This would mean that an already existing API (GitHub as the example with probably billions of repos) would then conform to our new specification. Sure, they could change in the future, but the incentive to do so might change if they know that their resource is friendly to OCI. If people start building things using them two? Then I suppose weโ€™d start to see another company/ies representation at the meetings :)

SteveLasker commented 5 years ago

an astonishing amount of scope creep for a summary ๐Ÿ˜‰

Yeah, umm, I tend to work from a master plan approach, knowing where all the pieces could go, then scope back in incremental pieces. Starting smaller is goodness. I mostly wanted to call out the auth issues and recognize different registries implemented namespaces differently, and is likely a good vendor differentiator we won't likely get agreement upon. If we can enhance the catalog like api to support listing from a sub namespace, we can likely find a good consistent place.

newest tag, "horrifying"

I had thought the same at first. But we've had developers want to deploy the latest/newest build to a dev environment. While they could pull the tag, based on a webhook, we got feedback they want an ordered tag listing. They also wanted ordered tag listings in other tools, like DevOps and App Services, where the user can choose a tag from a combo box. They wanted to get the same experience across Docker Hub and ACR. It would be great if a customer could choose from other registries they might happen to host with Azure as well.

top

Could be later, as long as paging has a reasonable, small page size

CLI Agreed on it being an interesting incubation, not part of the spec.

tag listing API to support artifact type

I forgot to include the tag listing should support listing the artifactType, enabling tools to understand what the tag represents.

untagged/tag listing

Jon brought up an interesting reference to be able to list untagged manifests. There's a good discussion here, as well as possibly understanding the history of manifests a tag represented. When a stable tag of a base image is updated to reflect OS & FX patching, it's also interesting to know the previous manifest, in case a user must roll back.

bsatlas commented 5 years ago

Would the new catalog/listing operation be a required or optional endpoint?

mikebrow commented 5 years ago

Collecting use cases, forming workgroup here: https://hackmd.io/s/BJPAUxDvV#OCI-Catalog-Listing-API---Workgroup

vbatts commented 4 years ago

I am now looking forward to @josephschorr proposal on a pubsub event model.

josephschorr commented 4 years ago

Hoping to publish it for community review within a few weeks, as holidays adds some delays :)

SteveLasker commented 4 years ago

Do we want to close this, and let Joey continue to make progress on the Pub/Sub model? I love what Joey is doing for the specific content update scenario. But, that's not the same for quick-hit scenarios where someone just needs to see a one-time listing of repos or tags. Just suggest we close this one, and have separate PRs for new proposals. They can reference this for the history of the conversation if valuable.

vbatts commented 4 years ago

@josephschorr are you waiting on something for the pub/sub events PR?

mikebrow commented 4 years ago

I would like to see pub/sub done in such a way so as to cover the one time listing of repos/tags with published updates to follow based on the subscription. Let's wait to close this till we have a resolution to the issue I think?

josephschorr commented 4 years ago

@vbatts I was hoping for some more feedback on my document before I opened it

vbatts commented 4 years ago

@josephschorr ok. Let me get #111 shaped up then, and you can ready your PR

vbatts commented 3 years ago

_catalog got removed from the final v1.0.0 Thanks for all the discussion. ๐ŸŽ‰