opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
837 stars 206 forks source link

Requirements: Search #71

Open SteveLasker opened 5 years ago

SteveLasker commented 5 years ago

OCI Artifact Search Requirements

As registries support multiple artifact types, a search/catalog API that supports filtering on the artifact type will be needed.

The docker v1 registry spec supported Docker Search. While some vendors like Quay.io implemented the v1 search API, the majority of vendors require the v2 registry api which dropped search.

We believe revisiting the search api will support client CLIs that span registries, such as helm search, duffle search (CNAB), docker search, and other evolving artifact types.

By supporting a common search API across all registries, users could consistently use these new artifact CLIs across all registries.

This issue focuses on capturing the requirements for new Search and Eventing APIs. As the requirements are agreed upon, we'll move to a spec that captures the requirements.

KubeCon 2019 EU Notes: OCI Catalog Listing APIs

Use Cases

Search is a generic capability used across several different use cases.

Tool Specific Searches

Helm, Singularity, Docker, OPA, CNAB and other tools will need to query their specific artifact types across various registries. Example: The helm cli would need to query a registry for charts that match a specific name. The result should return helm only artifacts.

helm search demo42.azurecr.io hello-world

Results
--------------------------------------
samples/hello-world
marketing/products/hello-world-sample
dev/prototypes/sample-hello-world

Version specific searches:

helm search --versions demo42.azurecr.io samples/hello-world

Results
--------------------------------------
samples/hello-world   1.0
samples/hello-world   1.1
samples/hello-world   1.2

Registry Specific Search

Users want to query registries for the artifacts that match a specific name or list artifacts within a given path. In this case, the results contain multiple artifact types.

Today, registries have created unique client APIs and server APIs. Until we have a generic registry client, it's expected registries will have vendor specific APIs. However, having common registry server side APIs expands the possibility for common tooling across registries.

A registry search API would include

Existing examples

ACR list repo example:

Without a common search/catalog API, cloud vendors have had to implement vendor specific experiences:

az acr repository list -n demo42

Name                         
-----------------------------
samples/demo42/queueworker   
samples/demo42/quotes-api    
samples/demo42/web           
samples/demo42/deploy/chart  
samples/demo42/deploy/cnab   
samples/demo42/deploy/arm    

ACR list tags example, w/ future type added:

az acr repository show-tags -n demo42 --repository samples/demo42/deploy/chart

Result  Type
-------------
1.0     helm-chart
1.1     helm-chart
1.1.1   helm-chart
2.0     helm-chart
3.0     helm-chart

A repo could contain multiple artifact types

az acr repository show-tags -n demo42 --repository samples/demo42/deploy

Result       Type
------------ ----------------
helm-1.0     helm-chart
helm-1.1     helm-chart
helm-1.1.1   helm-chart
cnab-1.0     cnab
arm-1.0      arm

Rather than each registry vendor having to offer unique APIs, the goal would be to offer a common API.

Registry Tool Search - Scanners

Vendors and the community have attempted to build tools atop registries.

Without a common search/catalog API, these tools must work with individual images.

One of the most common registry tools include image scanning tools like Aqua, Twistlock, Neuvector and Clair. While the scanning tools protect runtime nodes, they all pre-scan registries to understand image vulnerabilities before they're run.

Scanners evaluate images in registries with a combination of a search/catalog API and events.

These vulnerability scanners need the following:

Today, scanners assume all artifacts in a registry are a container image. As a registry stores new artifact types, scanners will either need to know how to scan these new artifacts, or at least filter the results to artifacts they support.

Artifact Types

A registry must know the types it hosts for it to provide meaningful search results. Artifact types will be internally identified by an expanded set of OCI Media Types.

However, displaying application/vnd.cncf.helm.chart.v1+json does not make for a good user experience. To provide clean user experiences, a list of artifact types, a short description, and info on the artifact tooling will be maintained. Media Type Short Names

Media Type Display Name Info
application/vnd.oci.image.index.v1+json OCI Image Docker *
application/vnd.oci.image.manifest.v1+json OCI Image Docker *
application/vnd.cncf.helm.chart.v1+json Helm Helm
application/vnd.oci.cnab.index.v1+json CNAB Duffle, Docker-application

* most registry providers automatically convert oci.image manifests to the format requested by the client.

Registry Search Requirements

Listing repos

Listing artifacts

Listing versions

Filtering by artifacts

Filtering by date ranges

Search queries may specify date ranges, enabling the return of artifacts that have been created or changed since a given date:time

Paging

Results may be paged, to provide a full list of artifacts. A default page size of 100, with the ability to change the paging size.

Sorting

As results may be paged, being able to sort provides the ability to get the top n results, based on a given sort order. Sorting includes ascending and descending.

Role Based Access Control

Search results shall be limited to the artifacts the user has read access control. The user may be a person or service account. The spec shall not define specific rights or roles for how authorization should be implemented or managed, rather simply state the registry must be cognizant of security and support the security of it's product and/or platform. If a user has read access to repo1 and repo3, but not repo2, the repository listing should only return repo1, repo3. The spec will not define a role where read differentiates between management and data operations.

jonjohnsonjr commented 5 years ago

A lot of this seems more about listing than searching. We really need to figure out what that will look like. I had a strawman here that suffered a painful death by bikeshedding.

as results may be paged, sorting the results by name and/or version with ascending and defending options

descending

What's a "version" here, from a registry perspective? A tag or digest? Just a tag? A new thing?

Example: The helm cli would need to query a registry for charts that match a specific name. The result should return helm only artifacts.

This might be a pain. The artifacts stuff works because registries can be (mostly) agnostic to the contents of what's being distributed. The more stuff we require to be indexed, the less flexible we make the distribution API.

Search queries may specify date ranges, enabling the return of artifacts that have been created or changed since a given date:time

Artifact creation from the client's perspective (i.e. created time) or when it was pushed?

Search results shall be limited to the artifacts the user has read access control. The user may be a person or service account.

The spec doesn't currently speak to access models currently, so I'm not sure how appropriate this is.

SteveLasker commented 5 years ago

Thanks @jonjohnsonjr , As I reviewed the hackmd doc, posting here, I realized we had much better structure of conversation in the KubeCon 2019 EU Notes: OCI Catalog Listing APIs

This topic should, IMO, cover listing, search and eventing as these are all required to meet the scenarios.

What's a "version" here, from a registry perspective? A tag or digest? Just a tag? A new thing? Good point. I suppose we would need both. Example: The helm cli would need to query a registry for charts that match a specific name. The result should return helm only artifacts. This might be a pain. The artifacts stuff works because registries can be (mostly) agnostic to the contents of what's being distributed. The more stuff we require to be indexed, the less flexible we make the distribution API.

The premise of artifacts means all objects in a registry have a unique manifest.config.mediaType. To your point, search would likely incorporate top level data, such as :tag, digests and annotations. I think you're referring to information stored in the manifest.config. I'd suggest it would be up to the registry operator to decide what "value add" they wish to provide. If gcr wanted to surface additional search metadata which is parsed from configs of specific artifacts, that would be cool. But, I would imagine the search spec would say this was optional. Annotations, :tags, manifests would be required.

Search queries may specify date ranges, enabling the return of artifacts that have been created or changed since a given date:time

Artifact creation from the client's perspective (i.e. created time) or when it was pushed?

This is an interesting one where the value is stored in a config, unique to the OCI Image artifact type. This goes to the conversation above, related to parsing config objects. I'd probably still say, we'd say the spec would MUST on the manifest and tag dates, while making config values optional value add.

Search results shall be limited to the artifacts the user has read access control. The user may be a person or service account. The spec doesn't currently speak to access models currently, so I'm not sure how appropriate this is.

One of the complaints I've heard about implementing _catalog consistently was it didn't address returning information the user doesn't have access to. I'm not suggesting we spec a specific auth model. Rather, stating results must match the rights of the user. I'd go further to say how rights/roles are defined should not be specified, rather leave it fairly high level, enabling registry operators the freedom to implement models that align with their products or platforms.

defending decending

I'll fix, thanks

rchincha commented 5 years ago

Has a GraphQL [1] endpoint been considered for queries?

[1] https://en.wikipedia.org/wiki/GraphQL

SteveLasker commented 4 years ago

Suggest adding a label for vNext to avoid this being incorporated into the v1 scope.

vbatts commented 4 years ago

after having thought about the extensions proposal #111 I'm wondering whether this might ought to be an extension that registerys could choose to implement?

mikebrow commented 4 years ago

+1 on extension

SteveLasker commented 3 years ago

Just bumping a few links as searching/discovering and indexing continues to come up:

jonjohnsonjr commented 3 years ago

continues to come up

Where?

SteveLasker commented 3 years ago

helm, bicep, wasm and others that are trying to utilize registries as their package management, so they don' have to build one.

rchincha commented 3 years ago

As a data point, zot project has used graphQL [1] to help with this [2], [3]. Caveat that it does need the client to be aware of the graphQL schema.

[1] https://en.wikipedia.org/wiki/GraphQL [2] https://github.com/anuvu/zot#listing-images [3] https://github.com/anuvu/zot/blob/main/pkg/extensions/search/schema.graphql#L57

rchincha commented 1 year ago

An update on this, more formalized as as OCI dist-spec extension: https://github.com/project-zot/zot/blob/main/pkg/extensions/search/search.md

discoverable via: https://github.com/opencontainers/distribution-spec/tree/main/extensions https://github.com/opencontainers/distribution-spec/blob/main/extensions/_oci.md

rchincha commented 10 months ago

Any interest in reviving this conversation? Now that we are converging towards adding the ability to store image and non-image artifacts in OCI conformant registries.

Can try to send out a draft proposal (roughly along the lines of what zot currently has)

rchincha commented 10 months ago

https://docs.docker.com/engine/reference/commandline/search/ https://learn.microsoft.com/en-us/cli/azure/acr/repository?view=azure-cli-latest https://docs.aws.amazon.com/cli/latest/reference/ecr/ https://console.cloud.google.com/gcr/images/google-containers/GLOBAL (???)