spec: support for passing client image name for mirroring use case

dmcgowan commented 6 years ago

There is an assumption to today that a server implementation of the distribution specification will either not care about the name used by the client or that all requests will have a known common namespace. An example of this is the Docker Hub assuming that all requests are prefixed with docker.io even though the registry hostname is registry-1.docker.io. However this has always caused difficulty when a client then wants to mirror content, localhost:5000/library/ubuntu could proxy to registry-1.docker.io/library/ubuntu, however localhost:5000 could never proxy to anything else. Complicated registry configurations have been proposed to remedy this as well as a backwards incompatible approach of requesting as localhost:5000/docker.io/library/ubuntu. However a goal of this specification should be simplicity and backwards compatibility. I believe that a solution does belong in the specification to unlock the mirroring use cases without complicated configuration or DNS setup.

My proposal is to add a way to pass up the name resolved by the client to the registry, (e.g. docker.io/library/ubuntu). So if a request is going to localhost:5000/library/ubuntu, it could mirror both docker.io/library/ubuntu and quay.io/library/ubuntu and switch based on request parameters. There are 2 possible ways to achieve this, one is by creating adding an HTTP request header (e.g. OCI-REF-NAME: docker.io/library/ubuntu) or by adding a query parameter ?oci-ref-name=docker.io/library/ubuntu). The first is clean but the second may be more useful for static mirroring. I am not suggesting one over the other yet, just stating the problem and solutions to discuss.

wking commented 6 years ago

On Mon, Apr 16, 2018 at 10:35:39PM +0000, Derek McGowan wrote:

So if a request is going to localhost:5000/library/ubuntu, it could mirror both docker.io/library/ubuntu and quay.io/library/ubuntu and switch based on request parameters.

This could also be a “caching proxy” use case, depending on when the upstream requests happen. Just dropping in some additional keywords in case that helps folks discover this issue again later on.

Complicated registry configurations have been proposed to remedy this as well as a backwards incompatible approach of requesting as localhost:5000/docker.io/library/ubuntu.

Can you shed more light on why this is backwards incompatible? I don't see wording in the current spec that would care about what goes into the ‘’ portion of the URLs besides 1:

Classically, repository names have always been two path components where each path component is less than 30 characters. The V2 registry API does not enforce this…

If a registry was trying to mirror/proxy multiple upstream registries, I don't see why the registry couldn't define a default (for any of these approaches). For example, “when I get a two-component name, the implicit first component is ‘docker.io’” (or whatever) as a local policy. And without such a default, I don't see how it would support clients who are only capable of creating two-component names.

Longer term, the default-component approach may run into issues (e.g. if you wanted to mirror/proxy a namespace that didn't expect two child components, e.g. example.com/ubuntu or example.com/some/group/app). The default-name-component approach is not forward-compatible with those cases, but that's a distinct issue from backwards compatibility. And you could cludge around the limitation with blacklists for defaults (e.g. “don't inject default components if the given name's first component is example.com”). If we go with the default component approach, folks maintaining default components would ideally get their user-base upgraded to clients which used fully-qualified names before the forward-compat issues became too troublesome. If that timescale is expected to be very long (because some clients will never upgrade?), then one of your “this channel always contains the fully-qualified image name” approaches would be a better choice.

dmcgowan commented 6 years ago

@wking number of components have no relevance here. The specification does not define anything about the path components. The backwards incompatibility comes from existing clients and servers. If a client is upgraded and now starts requesting localhost:5000/docker.io/library/ubuntu, the registry would have to be configured to treat docker.io as the same as previous requests it had seen. If it was an older registry, then it would just not understand the request, forcing the client to resend the request without docker.io. This sort of feature probing is a huge pain to implement for clients and this kind of configuration is really messy on the server. Using headers or query parameters can be safely ignored by older registries or omitted by older clients.

wking commented 6 years ago

On Mon, Apr 16, 2018 at 11:32:43PM +0000, Derek McGowan wrote:

If a client is upgraded and now starts requesting localhost:5000/docker.io/library/ubuntu, the registry would have to be configured to treat docker.io as the same as previous requests it had seen. If it was an older registry, then it would just not understand the request, forcing the client to resend the request without docker.io.

Ah, I'd only considered old-client/new-registry above. I agree that new-client/old-registry would need some sort of client fallback for registries that didn't recognize the fully qualified name in the URL path.

Using headers or query parameters can be safely ignored by older registries or omitted by older clients.

So what would the logic for new clients be? Always set the fully-qualified name in the query parameter (or wherever) and always drop the leading component when constructing the URL path? That would probably work, although it doesn't end up in a world where we could eventually drop the query parameter. The spec already supports version checks 1, perhaps we can do whatever for the remainder of v2 and then require fully-qualified names in the path once we cut a v3 API? That would at least restrict “feature probing” to the initial version check that clients should be performing anyway (or should be performing when their non-version request 404s ;).

xiaods commented 6 years ago

just like mirror-proxy function, it not spec scope in my mind.

bsatlas commented 5 years ago

There hasn't been much talk about this issue. Is this something we want to put on the agenda for Wednesday's call or can we push this to a later release?

dmcgowan commented 5 years ago

I am going to open up a PR for it this week. We can discuss the design further there. I think this is important to properly implement the mirroring use case in a less opinionated manner (currently a mirror can only mirror a single upstream registry).

bsatlas commented 5 years ago

How about implementing a /v2/mirror/<repo> or /v2/<repo>/mirror endpoint and have the client use the Host header to let the registry know where to pull from?

dmcgowan commented 5 years ago

Mirrors should be mostly transparent to the client, kind of like setting an HTTP proxy. Also the issue with the current situation is the repository name used by registries does not contain the host name which could lead to namespace collision in the registry implementation in the mirroring case. Using the HOST header in this case would not give enough indication of what the upstream HOST is, only the HOST for the mirror. HTTP proxying already covers the case where HOST does not need to point to the mirror, but this doesn't solve the case of having a single proxy/cache that can be used for multiple upstream registries.

dmcgowan commented 5 years ago

I am working on a PR for this now. I will add a section under Use Cases for this which will describe how it is used, please comment on the design. The PR will update the individual requests.

Mirroring and Proxy Caching

Company X sets up an internal registry which is capable of storing local copies of images from any upstream registry. Registry clients are configured to send all requests to retrieve registry data to the internal registry. The clients attaches the OCI-Repository-Authority HTTP header to every registry request indicating the original registry host name. The original registry host name is the authority for the given repository and used by the internal registry to fetch content and authentication parameters.

bsatlas commented 5 years ago

I think X-Proxy-Registry or OCI-Proxy-Registry is cleaner. When I think of authority, I think TLS certificates :P

bsatlas commented 5 years ago

Also, is it possible for clients to use separate creds for the local and authority?

dmcgowan commented 5 years ago

Using the term "authority" here because a proxy is really required to delegate authority over content and access to that content to somewhere else. Whether it does that delegation by proxying is an implementation detail by the registry, same as how it constructs any proxy requests.

One thing to consider though is the use of an HTTP header vs a query parameter. A query parameter gives better cacheability in cases where there could be an even less sophisticated HTTP cache in between. A query parameter would prevent identical requests returning different content based solely on a non-standard HTTP header. In that cases we would have something like /v2/dmcgowan/myrepo/manifests/latest?authority=docker.io as the path. This is slightly more visible for registries which would not implement this though, however it may be a better solution. Note this query parameter would only show up when the client knows it is going through a mirror, because the HTTP Host header does not match the intended authority.

justincormack commented 5 years ago

Yes I think a query parameter is better, otherwise for caching you need to set vary-by on a nonstandard header.

dmcgowan commented 5 years ago

@justincormack my plan here to PoC it in containerd then open a PR for the spec here. I am not sure we have used a consistent naming scheme for what we call this, in containerd we usually call this part the namespace. Sometimes it is referred to as domain, registry, or host.

stevvooe commented 5 years ago

I think this is a good approach and the first step in separating registry location from the "authority". The eventual goal should be to encode the authority in the image name, but this will allow for cases where it is not.

Do registries currently ignore this parameter?

opencontainers / distribution-spec

spec: support for passing client image name for mirroring use case #12

Mirroring and Proxy Caching