Open tsibley opened 1 year ago
Ah, this issue extends to any image name with an explicit registry in it, including docker.io
when explicitly specified, because we treat it as part of the image name when constructing the API requests.
The best thing to do here might be to (finally) switch to using docker-py
instead of directly making the API requests (and potentially down the road, instead of invoking docker
directly). The docker-py
library will handle registries correctly by interfacing with the Docker daemon and existing config (I believe).
Oh, hmm. Not quite with docker-py, as it doesn't have much in the way of support for interacting with registries. It has great support for interacting with the Docker daemon, which in turn can interact with registries, but not do things like list the tags for an image repository. On the command line, I'd use something like Skopeo, e.g.:
docker run \
--rm \
--interactive \
--tty \
--network host \
-v "$HOME"/.docker/config.json:/config.json:ro \
quay.io/skopeo/stable:v1 list-tags \
--authfile /config.json \
docker://docker.io/nextstrain/base
Ugh. I guess we could shell out to Skopeo inside a Docker container to get the tags?
Preliminary patch for this which I'll turn into a proper PR eventually.
diff --git a/nextstrain/cli/runner/docker.py b/nextstrain/cli/runner/docker.py
index 0ffe971..b0f4733 100644
--- a/nextstrain/cli/runner/docker.py
+++ b/nextstrain/cli/runner/docker.py
@@ -409,6 +409,15 @@ def latest_build_image(image_name: str) -> str:
>>> latest_build_image("nextstrain/base")
'nextstrain/base:build-...'
+
+ When a registry other than docker.io (Docker Hub) is present, *image_name*
+ is passed through unchanged.
+
+ >>> latest_build_image("ghcr.io/nextstrain/base")
+ 'ghcr.io/nextstrain/base'
+
+ >>> latest_build_image("ghcr.io/nextstrain/base:build-20220215T000459Z")
+ 'ghcr.io/nextstrain/base:build-20220215T000459Z'
"""
def GET(url, **kwargs):
response = requests.get(url, **kwargs)
@@ -429,13 +438,17 @@ def latest_build_image(image_name: str) -> str:
}
return GET(url, headers = headers).json().get("tags", [])
- repository, tag = split_image_name(image_name, implicit_latest = False)
+ registry, repository, tag = split_image_name(image_name, implicit_latest = False, split_registry = True)
if not tag or is_build_tag(tag):
- build_tags = sorted(filter(is_build_tag, tags(repository)))
+ if registry == "docker.io":
+ build_tags = sorted(filter(is_build_tag, tags(repository)))
- if build_tags:
- return repository + ":" + build_tags[-1]
+ if build_tags:
+ # Intentionally omit the default registry
+ return repository + ":" + build_tags[-1]
+ else:
+ warn(f"Image registry is not docker.io (Docker Hub); checking for newer 'build-…' tags is unsupported.")
return image_name
diff --git a/nextstrain/cli/util.py b/nextstrain/cli/util.py
index 8e09c54..2b20c28 100644
--- a/nextstrain/cli/util.py
+++ b/nextstrain/cli/util.py
@@ -500,14 +500,14 @@ def byte_quantity(quantity: str) -> int:
@overload
-def split_image_name(name: str, implicit_latest: Literal[True] = True) -> Tuple[str, str]:
+def split_image_name(name: str, implicit_latest: Literal[True] = True, split_registry: Literal[False] = False) -> Tuple[str, str]:
...
@overload
-def split_image_name(name: str, implicit_latest: Literal[False]) -> Tuple[str, Optional[str]]:
+def split_image_name(name: str, implicit_latest: Literal[False], split_registry: Literal[True]) -> Tuple[str, str, Optional[str]]:
...
-def split_image_name(name: str, implicit_latest: bool = True) -> Tuple[str, Optional[str]]:
+def split_image_name(name: str, implicit_latest: bool = True, split_registry: bool = False) -> Union[Tuple[str, Optional[str]], Tuple[str, str, Optional[str]]]:
"""
Split the Docker image *name* into a (repository, tag) tuple.
@@ -522,13 +522,49 @@ def split_image_name(name: str, implicit_latest: bool = True) -> Tuple[str, Opti
>>> split_image_name("nextstrain/base:latest", implicit_latest = False)
('nextstrain/base', 'latest')
+
+ Or optionally into a (registry, repository, tag) tuple.
+
+ >>> split_image_name("nextstrain/base", split_registry = True)
+ ('docker.io', 'nextstrain/base', 'latest')
+
+ >>> split_image_name("ghcr.io/nextstrain/base", split_registry = True)
+ ('ghcr.io', 'nextstrain/base', 'latest')
+
+ >>> split_image_name("ghcr.io/nextstrain/base", split_registry = True, implicit_latest = False)
+ ('ghcr.io', 'nextstrain/base', None)
"""
if ":" in name:
repository, tag = name.split(":", maxsplit = 2)
else:
repository, tag = name, "latest" if implicit_latest else None
- return (repository, tag)
+ if not split_registry:
+ return (repository, tag)
+
+ # Default registry is Docker Hub, which has an older alias. While
+ # registry-1.docker.io and registry.hub.docker.com are also working
+ # domains, they aren't treated by Docker as equivalent to the default.
+ default_registry = "docker.io"
+ default_aliases = {"index.docker.io"}
+
+ registry = default_registry
+
+ # Rules for determining if the first part is a registry domain or not.
+ # <https://github.com/distribution/distribution/blob/f7717b78/reference/normalize.go#L123-L140>
+ if "/" in repository:
+ parts = repository.split("/", 1)
+
+ if ("." in parts[0]
+ or ":" in parts[0]
+ or parts[0] == "localhost"
+ or parts[0] != parts[0].lower()):
+ registry, repository = parts
+
+ if registry in default_aliases:
+ registry = default_registry
+
+ return (registry, repository, tag)
def glob_matcher(patterns: Sequence[str]) -> Callable[[Union[str, Path]], bool]:
Current Behavior
During
update
, the Docker runner assumes all images come from the Docker Hub (docker.io) registry.Despite that, Docker images from other registries still mostly work unless a tag-less image name is given or a
build-*
tag is given. Either case triggers a different code path:https://github.com/nextstrain/cli/blob/babc215092465758bf3df88ccf3eaae54add47c3/nextstrain/cli/runner/docker.py#L434-L440
and
nextstrain.cli.runner.docker._update
ends up trying to enumerate thebuild-*
tags of the non-docker.io
image by making requests to thedocker.io
registry. This understandably results in a 400 Bad Request from the latter. Oops.Originally discovered in https://github.com/nextstrain/docker-base/pull/148#issuecomment-1539203279.
Expected behavior
Ideally the update code would fetch the tags available from the correct registry, but this is often complicated by forbidding of anonymous access and other authz issues.
As a temporary workaround, perhaps we could simply not support updates from other registries for now. This would involve detecting the registry correctly and excluding those which aren't
docker.io
from thebuild-*
tag handling.How to reproduce
The
manylinux
image here is used only as an example. Any image from a registry other thandocker.io
will have the same issue.