Open dmcgowan opened 6 years ago
On Mon, Apr 16, 2018 at 10:35:39PM +0000, Derek McGowan wrote:
So if a request is going to
localhost:5000/library/ubuntu
, it could mirror bothdocker.io/library/ubuntu
andquay.io/library/ubuntu
and switch based on request parameters.
This could also be a “caching proxy” use case, depending on when the upstream requests happen. Just dropping in some additional keywords in case that helps folks discover this issue again later on.
Complicated registry configurations have been proposed to remedy this as well as a backwards incompatible approach of requesting as localhost:5000/docker.io/library/ubuntu.
Can you shed more light on why this is backwards incompatible? I
don't see wording in the current spec that would care about what goes
into the ‘
Classically, repository names have always been two path components where each path component is less than 30 characters. The V2 registry API does not enforce this…
If a registry was trying to mirror/proxy multiple upstream registries, I don't see why the registry couldn't define a default (for any of these approaches). For example, “when I get a two-component name, the implicit first component is ‘docker.io’” (or whatever) as a local policy. And without such a default, I don't see how it would support clients who are only capable of creating two-component names.
Longer term, the default-component approach may run into issues (e.g. if you wanted to mirror/proxy a namespace that didn't expect two child components, e.g. example.com/ubuntu or example.com/some/group/app). The default-name-component approach is not forward-compatible with those cases, but that's a distinct issue from backwards compatibility. And you could cludge around the limitation with blacklists for defaults (e.g. “don't inject default components if the given name's first component is example.com”). If we go with the default component approach, folks maintaining default components would ideally get their user-base upgraded to clients which used fully-qualified names before the forward-compat issues became too troublesome. If that timescale is expected to be very long (because some clients will never upgrade?), then one of your “this channel always contains the fully-qualified image name” approaches would be a better choice.
@wking number of components have no relevance here. The specification does not define anything about the path components. The backwards incompatibility comes from existing clients and servers. If a client is upgraded and now starts requesting localhost:5000/docker.io/library/ubuntu
, the registry would have to be configured to treat docker.io
as the same as previous requests it had seen. If it was an older registry, then it would just not understand the request, forcing the client to resend the request without docker.io
. This sort of feature probing is a huge pain to implement for clients and this kind of configuration is really messy on the server. Using headers or query parameters can be safely ignored by older registries or omitted by older clients.
On Mon, Apr 16, 2018 at 11:32:43PM +0000, Derek McGowan wrote:
If a client is upgraded and now starts requesting
localhost:5000/docker.io/library/ubuntu
, the registry would have to be configured to treatdocker.io
as the same as previous requests it had seen. If it was an older registry, then it would just not understand the request, forcing the client to resend the request withoutdocker.io
.
Ah, I'd only considered old-client/new-registry above. I agree that new-client/old-registry would need some sort of client fallback for registries that didn't recognize the fully qualified name in the URL path.
Using headers or query parameters can be safely ignored by older registries or omitted by older clients.
So what would the logic for new clients be? Always set the fully-qualified name in the query parameter (or wherever) and always drop the leading component when constructing the URL path? That would probably work, although it doesn't end up in a world where we could eventually drop the query parameter. The spec already supports version checks 1, perhaps we can do whatever for the remainder of v2 and then require fully-qualified names in the path once we cut a v3 API? That would at least restrict “feature probing” to the initial version check that clients should be performing anyway (or should be performing when their non-version request 404s ;).
just like mirror-proxy function, it not spec scope in my mind.
There hasn't been much talk about this issue. Is this something we want to put on the agenda for Wednesday's call or can we push this to a later release?
I am going to open up a PR for it this week. We can discuss the design further there. I think this is important to properly implement the mirroring use case in a less opinionated manner (currently a mirror can only mirror a single upstream registry).
How about implementing a /v2/mirror/<repo>
or /v2/<repo>/mirror
endpoint and have the client use the Host
header to let the registry know where to pull from?
Mirrors should be mostly transparent to the client, kind of like setting an HTTP proxy. Also the issue with the current situation is the repository name used by registries does not contain the host name which could lead to namespace collision in the registry implementation in the mirroring case. Using the HOST
header in this case would not give enough indication of what the upstream HOST
is, only the HOST
for the mirror. HTTP proxying already covers the case where HOST
does not need to point to the mirror, but this doesn't solve the case of having a single proxy/cache that can be used for multiple upstream registries.
I am working on a PR for this now. I will add a section under Use Cases
for this which will describe how it is used, please comment on the design. The PR will update the individual requests.
Company X sets up an internal registry which is capable of storing local copies of images from any upstream registry.
Registry clients are configured to send all requests to retrieve registry data to the internal registry.
The clients attaches the OCI-Repository-Authority
HTTP header to every registry request indicating the original registry host name.
The original registry host name is the authority for the given repository and used by the internal registry to fetch content and authentication parameters.
I think X-Proxy-Registry
or OCI-Proxy-Registry
is cleaner. When I think of authority, I think TLS certificates :P
Also, is it possible for clients to use separate creds for the local and authority?
Using the term "authority" here because a proxy is really required to delegate authority over content and access to that content to somewhere else. Whether it does that delegation by proxying is an implementation detail by the registry, same as how it constructs any proxy requests.
One thing to consider though is the use of an HTTP header vs a query parameter. A query parameter gives better cacheability in cases where there could be an even less sophisticated HTTP cache in between. A query parameter would prevent identical requests returning different content based solely on a non-standard HTTP header. In that cases we would have something like /v2/dmcgowan/myrepo/manifests/latest?authority=docker.io
as the path. This is slightly more visible for registries which would not implement this though, however it may be a better solution. Note this query parameter would only show up when the client knows it is going through a mirror, because the HTTP Host
header does not match the intended authority
.
Yes I think a query parameter is better, otherwise for caching you need to set vary-by on a nonstandard header.
@justincormack my plan here to PoC it in containerd then open a PR for the spec here. I am not sure we have used a consistent naming scheme for what we call this, in containerd we usually call this part the namespace
. Sometimes it is referred to as domain
, registry
, or host
.
I think this is a good approach and the first step in separating registry location from the "authority". The eventual goal should be to encode the authority in the image name, but this will allow for cases where it is not.
Do registries currently ignore this parameter?
There is an assumption to today that a server implementation of the distribution specification will either not care about the name used by the client or that all requests will have a known common namespace. An example of this is the Docker Hub assuming that all requests are prefixed with
docker.io
even though the registry hostname isregistry-1.docker.io
. However this has always caused difficulty when a client then wants to mirror content,localhost:5000/library/ubuntu
could proxy toregistry-1.docker.io/library/ubuntu
, howeverlocalhost:5000
could never proxy to anything else. Complicated registry configurations have been proposed to remedy this as well as a backwards incompatible approach of requesting aslocalhost:5000/docker.io/library/ubuntu
. However a goal of this specification should be simplicity and backwards compatibility. I believe that a solution does belong in the specification to unlock the mirroring use cases without complicated configuration or DNS setup.My proposal is to add a way to pass up the name resolved by the client to the registry, (e.g.
docker.io/library/ubuntu
). So if a request is going tolocalhost:5000/library/ubuntu
, it could mirror bothdocker.io/library/ubuntu
andquay.io/library/ubuntu
and switch based on request parameters. There are 2 possible ways to achieve this, one is by creating adding an HTTP request header (e.g.OCI-REF-NAME: docker.io/library/ubuntu
) or by adding a query parameter?oci-ref-name=docker.io/library/ubuntu
). The first is clean but the second may be more useful for static mirroring. I am not suggesting one over the other yet, just stating the problem and solutions to discuss.