Closed dustin-groh-dev closed 2 weeks ago
I am not aware of any issues with rewrites. Can you provide a specific example of registries.yaml content that does not apply rewrites?
Note that rewrite rules ONLY apply when pulling images from a mirror endpoint; rewrites are NOT intended to apply when pulling an image directly from the registry itself (ie, when using the registry's default endpoint). If you add a wildcard entry with rewrites, but no endpoints, this is not expected to do anything.
I have similar issue. I am prepping for upgrades from rancher 2.7.10
rke2 1.24.9
to rancher 2.9.3
rke2 1.30.5
.
In my test environment, I have bootstrapped rke2 with 1.30.5
k get nodes
NAME STATUS ROLES AGE VERSION
node001-29ed1474 Ready control-plane,etcd,master,worker 12d v1.30.5+rke2r1
node002-29ed1474 Ready control-plane,etcd,master,worker 7d v1.30.5+rke2r1
node003-29ed1474 Ready control-plane,etcd,master,worker 7d v1.30.5+rke2r1
node-001-fe6af43f Ready worker 12d v1.30.5+rke2r1
node-002-fe6af43f Ready worker 7d v1.30.5+rke2r1
node-003-fe6af43f Ready worker 7d v1.30.5+rke2r1
with /etc/rancher/rke2/registries.yaml
mirrors:
dockerhub.internal.com:
endpoint:
- "https://dockerhub.internal.com"
rewrite:
"^rancher/(.*)": "docker-internal/rancher/$1"
dockerhub-master.internal.com:
endpoint:
- "https://dockerhub-master.internal.com"
rewrite:
"^rancher/(.*)": "docker-internal/rancher/$1"
oci.internal.com:
endpoint:
- "https://oci.internal.com"
rewrite:
"^rancher/(.*)": "docker-internal/rancher/$1"
configs:
dockerhub.internal.com:
auth:
password: redacted
username: username
dockerhub-master.internal.com:
auth:
password: redacted
username: username
qa-oci.internal.com:
auth:
password: redacted
username: username
with this configuration, failed with image not found.
RKE2 configured the /var/lib/rancher/rke2/agent/etc/containerd/config.toml
with config.toml.txt file content without mirrors and rewrites.
since I cannot manually edit and preserve changes in config.toml which is manged by RKE2. I used config.toml.tmpl
instead to add the rewrites manually in each node then image pulls are working as expected from private registry.
After fixing the rewrites I am able to deploy rancher helm chart version 2.9.3. Then to create a downstream cluster(downstream cluster provisioning, vm and other resources are created with terraform) have the same problem.
Oct 28 22:02:31 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:31Z" level=info msg="Rancher System Agent version v0.3.10 (7ad21ff) is starting"
Oct 28 22:02:31 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:31Z" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Oct 28 22:02:31 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:31Z" level=info msg="Starting remote watch of plans"
Oct 28 22:02:31 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:31Z" level=info msg="Starting /v1, Kind=Secret controller"
Oct 28 22:02:31 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:31Z" level=info msg="Detected first start, force-applying one-time instruction set"
.....
Oct 28 22:02:51 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:51Z" level=info msg="[Applyinator] Applying one-time instructions for plan with checksum fd5e40f76bdb17a2c54e01742cb28311567a5fe66cb9aea935108e0a5f25b95e"
Oct 28 22:02:51 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:51Z" level=info msg="[Applyinator] Extracting image dockerhub-master.internal.com/rancher/system-agent-installer-rke2:v1.24.9-rke2r2 to directory /var/lib/rancher/agent/work/20241028-220251/fd5e40f76bdb17a2c54e01742cb28311567a5fe66cb9aea935108e0a5f25b95e_0"
Oct 28 22:02:51 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:51Z" level=info msg="Using private registry config file at /etc/rancher/agent/registries.yaml"
Oct 28 22:02:51 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:51Z" level=info msg="Pulling image dockerhub-master.internal.com/rancher/system-agent-installer-rke2:v1.24.9-rke2r2"
Oct 28 22:02:52 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:52Z" level=warning msg="Failed to get image from endpoint: GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.24.9-rke2r2: : Repository 'rancher' not found"
Oct 28 22:02:52 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:52Z" level=warning msg="Failed to get image from endpoint: GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.24.9-rke2r2: : Repository 'rancher' not found"
Oct 28 22:02:52 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:52Z" level=error msg="error while staging: all endpoints failed: GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.24.9-rke2r2: : Repository 'rancher' not found; GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.24.9-rke2r2: : Repository 'rancher' not found: failed to get image dockerhub-master.internal.com/rancher/system-agent-installer-rke2:v1.24.9-rke2r2"
Oct 28 22:02:52 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:52Z" level=error msg="error executing instruction 0: all endpoints failed: GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.24.9-rke2r2: : Repository 'rancher' not found; GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.24.9-rke2r2: : Repository 'rancher' not found: failed to get image dockerhub-master.internal.com/rancher/system-agent-installer-rke2:v1.24.9-rke2r2"
Oct 28 22:02:52 node-kbc-001-360eb52d rancher-system-agent[18951]: time="2024-10-28T22:02:52Z" level=info msg="[Applyinator] No image provided, creating empty working directory /var/lib/rancher/agent/work/20241028-220251/fd5e40f76bdb17a2c54e01742cb28311567a5fe66cb9aea935108e0a5f25b95e_0"
In current live setup I have nearly 250 Clusters registered rancher 2.7.10 in differences sizes. In all the nodes the private registry credentials are rotated often stored in hashicorp vault > ExternalSecretsOperator > Rancher fleet-default/ExternalSecret > Rancher fleet-default/Secret > each cluster registry config uses this secret to update the credentials in each node automatically.
Updating config.toml with config.toml.tmpl in all 250 clusters with multi nodes is going to be very complex.
Is there anything with my registries.yaml? not sure why the rewrite is added to config.toml from /etc/rancher/rke2/registries.yaml.
I'm really confused by what you're doing here. Why are you trying to apply rewrites when pulling images directly from these registries? Why are you trying to override the desired behavior by providing your own containerd config template?
As I said above:
Note that rewrite rules ONLY apply when pulling images from a mirror endpoint; rewrites are NOT intended to apply when pulling an image directly from the registry itself
It looks like you're trying to use these private registries as a mirror for docker.io, and apply rewrites when pulling the Rancher images from these registries. In that case, you should actually set these up as mirrors for docker.io, as shown in the RKE2 docs:
mirrors:
docker.io:
endpoint:
- "https://dockerhub.internal.com"
- "https://dockerhub-master.internal.com"
- "https://qa-oci.internal.com"
rewrite:
"^rancher/(.*)": "docker-internal/rancher/$1"
configs:
dockerhub.internal.com:
auth:
password: redacted
username: username
dockerhub-master.internal.com:
auth:
password: redacted
username: username
qa-oci.internal.com:
auth:
password: redacted
username: username
RKE2 configured the /var/lib/rancher/rke2/agent/etc/containerd/config.toml with config.toml.txt file content without mirrors and rewrites.
Specifying mirrors and rewrites in containerd's config.toml has LONG been deprecated. Recent releases of RKE2 now put these configuration where they belong, in files under /var/lib/rancher/rke2/agent/etc/containerd/certs.d
. You will find a directory for each registry, containing a hosts.toml
file with the mirrors and rewrites. Only credentials (auth) still go in config.toml
Thanks @brandond for reply.
I am using rewrites in registries.yaml for pulling images from dockerhub-master.internal.com
, dockerhub.internal.com
, qa-oci.internal.com
because my images are saved in private registry at path <registry endpoint>/docker-internal/rancher/<all rancher images>
. (not at this location <registry endpoint>/rancher/<all images>
)
Yes, I read the containerd documentation that the using mirrors and rewrites in containerd config.toml is deprecated. I don't have plans to update the containerd's config.toml
with containerd's config.toml.tmpl
. I just tried as a testing to see if it works.
In my /etc/rancher/rke2/config.yaml
I am using system-default-registry: dockerhub-master.internal.com
and rancher-system-agent when it tries to pull the images it is trying to pull the image from dockerhub-master.internal.com/rancher/system-agent-installer-rke2:v1.24.9-rke2r2
which will fail because images are uploaded internally to dockerhub-master.internal.com/docker-internal/rancher/system-agent-installer-rke2:v1.24.9-rke2r2
. To use the path docker-internal/rancher
I used rewrite.
as you said
Note that rewrite rules ONLY apply when pulling images from a mirror endpoint; rewrites are NOT intended to apply when pulling an image directly from the registry itself
I don't need to use mirrors. just directly pull image from my private registry but upload the images in my private registry at path <private registry endpoint>/rancher/*
? with below config
/etc/rancher/rke2/registries.yaml
configs:
dockerhub.internal.com:
auth:
password: redacted
username: username
dockerhub-master.internal.com:
auth:
password: redacted
username: username
qa-oci.internal.com:
auth:
password: redacted
username: username
system-default-registry: dockerhub-master.internal.com
in /etc/rancher/rke2/config.yaml
Yep - if they're in your private registry under the same name, then you can just set system-default-registry in the config.yaml, and provide creds in registries.yaml.
Ok. I will need to check with internal team who maintains the private Artifactory to see if I can get a project with name rancher to use as path
In case If I only need to use custom path then may I know the correct configuration to use custom path for example:
Leave system-default-registry
unset, and do as I said above to use your Artifactory as a mirror for docker.io, with rewrites.
Ok. I will try and test it
Thanks @brandond
With below configuration RKE2 cluster bootstrapped successfully.
unset system-default-registry
/etc/rancher/agent/registries.yaml
/etc/rancher/rke2/registries.yaml
mirrors:
docker.io:
endpoint:
- https://dockerhub-master.internal.com
- https://dockerhub.internal.com
- https://oci.internal.com
rewrite:
^rancher/(.*): docker-internal/3rdparty/rancher/$1
configs:
dockerhub-master.internal.com:
auth:
username: username
password: password
dockerhub.internal.com:
auth:
username: username
password: password
oci.internal.com:
auth:
username: username
password: password
checked the images used in pods
k get pods --all-namespaces -o jsonpath="{..image}" | tr -s '[[:space:]]' '\n' | sort | uniq
docker.io/rancher/fleet-agent:v0.8.1
docker.io/rancher/hardened-calico:v3.27.2-build20240308
docker.io/rancher/hardened-cluster-autoscaler:v1.8.10-build20240124
docker.io/rancher/hardened-coredns:v1.11.1-build20240305
docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20230802
docker.io/rancher/hardened-flannel:v0.24.3-build20240307
docker.io/rancher/hardened-k8s-metrics-server:v0.6.3-build20231009
docker.io/rancher/hardened-kubernetes:v1.26.15-rke2r1-build20240314
docker.io/rancher/klipper-helm:v0.8.3-build20240228
docker.io/rancher/kube-api-auth:v0.2.0
docker.io/rancher/mirrored-sig-storage-snapshot-controller:v6.2.1
docker.io/rancher/mirrored-sig-storage-snapshot-validation-webhook:v6.2.2
docker.io/rancher/nginx-ingress-controller:nginx-1.9.3-hardened1
docker.io/rancher/rancher-agent:v2.7.10
docker.io/rancher/rancher-webhook:v0.3.6
docker.io/rancher/rke2-cloud-provider:v1.26.3-build20230406
docker.io/rancher/shell:v0.1.21
docker.io/rancher/system-agent:v0.3.3-suc
docker.io/rancher/system-upgrade-controller:v0.11.0
index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20230802
index.docker.io/rancher/hardened-kubernetes:v1.26.15-rke2r1-build20240314
index.docker.io/rancher/rke2-cloud-provider:v1.26.3-build20230406
rancher/fleet-agent:v0.8.1
rancher/hardened-calico:v3.27.2-build20240308
rancher/hardened-cluster-autoscaler:v1.8.10-build20240124
rancher/hardened-coredns:v1.11.1-build20240305
rancher/hardened-flannel:v0.24.3-build20240307
rancher/hardened-k8s-metrics-server:v0.6.3-build20231009
rancher/klipper-helm:v0.8.3-build20240228
rancher/kube-api-auth:v0.2.0
rancher/mirrored-sig-storage-snapshot-controller:v6.2.1
rancher/mirrored-sig-storage-snapshot-validation-webhook:v6.2.2
rancher/nginx-ingress-controller:nginx-1.9.3-hardened1
rancher/rancher-agent:v2.7.10
rancher/rancher-webhook:v0.3.6
rancher/shell:v0.1.21
rancher/system-agent:v0.3.3-suc
rancher/system-upgrade-controller:v0.11.0
also verified the containerd/cert.d
ls -l /var/lib/rancher/rke2/agent/etc/containerd/certs.d
total 0
drwx------. 2 root root 24 Nov 8 19:05 docker.io
drwx------. 2 root root 24 Nov 8 19:05 dockerhub-master.internal.com
drwx------. 2 root root 24 Nov 8 19:05 dockerhub.internal.com
drwx------. 2 root root 24 Nov 8 19:05 oci.iotcc.internal.com
and docker.io/host.toml has proper rewrites directive
cat /var/lib/rancher/rke2/agent/etc/containerd/certs.d/docker.io/hosts.toml
# File generated by rke2. DO NOT EDIT.
server = "https://registry-1.docker.io/v2"
capabilities = ["pull", "resolve", "push"]
[host."https://dockerhub-master.internal.com/v2"]
capabilities = ["pull", "resolve"]
[host."https://dockerhub-master.internal.com/v2".rewrite]
"^rancher/(.*)" = "docker-internal/3rdparty/rancher/$1"
[host."https://dockerhub.internal.com/v2"]
capabilities = ["pull", "resolve"]
[host."https://dockerhub.internal.com/v2".rewrite]
"^rancher/(.*)" = "docker-internal/3rdparty/rancher/$1"
[host."https://qa-oci.iotcc.internal.com/v2"]
capabilities = ["pull", "resolve"]
[host."https://qa-oci.iotcc.internal.com/v2".rewrite]
"^rancher/(.*)" = "docker-internal/3rdparty/rancher/$1"
As per the pod images, what I understood is that the images are downloaded and used from internet/public not from private registry registry( all the rancher images are uploaded to internal private registry stored at dockerhub-master.internal.com/docker-internal/3rdparty/rancher
. I expected the image showing like this dockerhub-master.internal.com/docker-internal/3rdparty/rancher/rancher-agent:v2.7.10
but showing as docker.io. Is my understanding wrong? is it pulling image from internal private registry?
for testing I completed removed rewrite directive to make it fail then I have this warning msg
92:Nov 08 18:01:21 kbc-001-226c65db.novalocal rancher-system-agent[1079]: time="2024-11-08T18:01:21Z" level=warning msg="Failed to get image from endpoint: GET https://dockerhub-master.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.26.15-rke2r1: : Repository 'rancher' not found"
93:Nov 08 18:01:21 kbc-001-226c65db.novalocal rancher-system-agent[1079]: time="2024-11-08T18:01:21Z" level=warning msg="Failed to get image from endpoint: GET https://dockerhub.internal.com/v2/rancher/system-agent-installer-rke2/manifests/v1.26.15-rke2r1: : Repository 'rancher' not found"
``` however it continued to download from docker.io and cluster bootstrap.
My use case is that I only need to download images internally only from custom location `dockerhub-master.internal.com/docker-internal/3rdparty/rancher/*`
I expected the image showing like this dockerhub-master.internal.com/docker-internal/3rdparty/rancher/rancher-agent:v2.7.10 but showing as docker.io. Is my understanding wrong? is it pulling image from internal private registry?
The K3s docs are a bit more comprehensive, all the content hasn't yet been migrated over to RKE2: https://docs.k3s.io/installation/private-registry#mirrors
Note that when using mirrors and rewrites, images will still be stored under the original name. For example,
crictl image ls
will showdocker.io/rancher/mirrored-pause:3.6
as available on the node, even if the image was pulled from a mirror with a different name.
The image is still from docker.io. The fact that it was actually pulled from an internal mirror, instead of directly from upstream, does not change that.
Environmental Info:
RKE2 v1.28.8 Rancher v2.8.3
Describe the bug:
When creating a new cluster via Rancher, RKE2 / containterd isn't applying the rewrite rules from /etc/rancher/rke2/registries.yaml specifically if the files have *mirrors.[].registry.....** when the docs and example don't have that wildcard so it's potentially a formatting change that got missed.
Steps To Reproduce:
For a customer this was reproducible for any cluster they were attempting to create via pipeline as they use the same registries.yaml. The fix was to edit the /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl to add in the mirror registry/rewrite rules and rerun the pipeline to create the cluster.
Expected behavior:
For RKE2 / containerd to apply the rewrite rules specified in the registries.yaml file even when they include a wildcard.
Additional context / logs: Potentially related to https://github.com/rancher/rke2/issues/3227