Closed immanuelfodor closed 3 years ago
Even though I've implemented all above changes (turn off admission controller, do not use latest tags, set imagePullPolicy to IfNotPresent), the cluster is querying DockerHub periodically (seems to be 5 minutes, only one node on the screenshot but all nodes do this so it's x3):
What more could I do to limit the DockerHub requests? I have only one latest tag on a busybox init container, maybe it causes it? But it's not in use as the parent pod is running fine. How can I debug what triggers the DockerHub requests?
Same issue here!
Hmm, nobody else is worrying that RKE clusters might/will get banned from DockerHub from 1st Nov?
Harbor Docker registry updated its docs to address the rate limiting: https://goharbor.io/docs/2.1.0/administration/configure-proxy-cache/
As of Harbor v2.1.1, Harbor proxy cache fires a HEAD request to determine whether any layer of a cached image has been updated in the Docker Hub registry. Using this method to check the target registry will not trigger the Docker Hub rate limiter. If any image layer was updated, the proxy cache will pull the new image, which will count towards the Docker Hub rate limiter.
Maybe a HEAD request is what RKE does and that's why the DNS query for DockerHub? Or it's trying to blindly pull the image no matter what is the current rate limit for that IP address? Does somebody know how registry checking for new images works in RKE?
Hi @superseb, you seem to be a core contributor, could you please help with this time critical issue? If not, could you help who to ask or where should I ask around?
I've setup a single node RKE cluster and monitored DNS queries but I don't see the behavior you are seeing. First of all, RKE is a binary that you manually run to create/provision a cluster, this runs ad-hoc and does not do anything outside of it running and provisioning. After RKE is done running, the components are upstream Kubernetes components with our settings. If this is causing the behavior we can certainly look into it, but we first need to isolate where the behavior is coming from. I've run a default cluster.yml
and ran a few pods but don't see recurring requests towards Docker Hub. The only way I could reproduce your behavior was by running a pod with a nonexistent image, which then went into ImagePullBackOff
which eventually reaches the 5 minute mark and will continue to do so. So to analyze the issue we need:
cluster.yml
)-v=9
for very verbose logging which will show the kubelet interactions regarding Docker images)The way to enable verbose logging is by using the following in cluster.yml
:
services:
kubelet:
extra_args:
v: 9
Let me know if I missed anything.
Thank you very much, this was what I needed for the debug. It turns out that the cluster was also running a private docker registry and it had a long-forgotten replication job that regularly checked for image updates. When I removed the registry, the DNS queries stopped just as you predicted: RKE wouldn't check images with kube-api.always_pull_images: false
in cluster.yml and imagePullPolicy: IfNotPresent
on deployments. Since Harbor v2.1.1 only uses HEAD requests (https://github.com/goharbor/harbor/issues/13112), if I upgrade the registry, even though the DNS queries will come back, the rate limit won't be hit.
One more question: would kube-api.always_pull_images: true
and/or imagePullPolicy: Always
also use HEAD requests against DockerHub in RKE? As I understand, it's a security best practice to always pull images but I also don't want to be rate limited.
RKE doesn't make API calls to DockerHub (or any other registry). It asks the Docker daemons on nodes to pull images or run containers, then the docker daemon makes whatever API calls it wants to the registry to accomplish that.
You would think their client would use their preferred method to talk to their own registry, but to confirm that you need a SSL man-in-the-middle proxy.
I see, thanks for the explanation. Then I need to check what method Docker is using in my version.
$ docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b
Built: Wed Mar 11 01:27:04 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b
Built: Wed Mar 11 01:25:42 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
DockerHub rate limiting (https://docs.docker.com/docker-hub/download-rate-limit/) will be effective from Nov 1 resulting in just 100 manifest pulls allowed in 6 hours for anonymous users.
Please provide guideance / best practice description how to reduce DockerHub registry checks to the bare minimum within an RKE cluster. Currently, a 3-node cluster with some deployed workloads make about 960+ registry-1.docker.io and 580+ auth.docker.io DNS queries in 24 hours (according to PiHole DNS log) which is well above the upcoming limit.
Some of the options I can think of:
Then recreate all pods to apply the change.
imagePullPolicy
=Always
cases (e.g., specifyIfNotPresent
and use exact image tags instead oflatest
): https://kubernetes.io/docs/concepts/containers/images/#updating-imagesShould these be enough, or are there any more steps one can do to ensure DockerHub registry checking is only done on intentional/explicit image pulls initiated by
kubectl
operations?How can one debug when and why an RKE cluster is checking the registry for new images?
RKE version:
RKE version
``` $ rke version INFO[0000] Running RKE version: v1.1.6 Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} ```
Docker version: (
docker version
,docker info
preferred)Docker version
``` $ docker version Client: Docker Engine - Community Version: 19.03.8 API version: 1.40 Go version: go1.12.17 Git commit: afacb8b Built: Wed Mar 11 01:27:04 2020 OS/Arch: linux/amd64 Experimental: false Server: Docker Engine - Community Engine: Version: 19.03.8 API version: 1.40 (minimum version 1.12) Go version: go1.12.17 Git commit: afacb8b Built: Wed Mar 11 01:25:42 2020 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.2.13 GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683 ``` ``` $ docker info Client: Debug Mode: false Server: Containers: 101 Running: 85 Paused: 0 Stopped: 16 Images: 46 Server Version: 19.03.8 Storage Driver: overlay2 Backing Filesystem:
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-147.8.1.el8_1.x86_64
Operating System: CentOS Linux 8 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.661GiB
Name: node1
ID: ....
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
```
Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)OS and kernel version
``` $ cat /etc/os-release NAME="CentOS Linux" VERSION="8 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="8" PLATFORM_ID="platform:el8" PRETTY_NAME="CentOS Linux 8 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:8" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-8" CENTOS_MANTISBT_PROJECT_VERSION="8" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="8" ``` ``` $ uname -r 4.18.0-147.8.1.el8_1.x86_64 ```
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Proxmox (KVM/QEMU)
cluster.yml file:
cluster.yml
``` # If you intened to deploy Kubernetes in an air-gapped environment, # please consult the documentation on how to configure custom RKE images. nodes: - address: 192.168.1.6 port: "22" internal_address: "" role: - controlplane - worker - etcd hostname_override: node1 user: centos docker_socket: /var/run/docker.sock ssh_key: "" ssh_key_path: ~/.ssh/id_ed25519 ssh_cert: "" ssh_cert_path: "" labels: {} taints: [] - address: 192.168.1.7 port: "22" internal_address: "" role: - controlplane - worker - etcd hostname_override: node2 user: centos docker_socket: /var/run/docker.sock ssh_key: "" ssh_key_path: ~/.ssh/id_ed25519 ssh_cert: "" ssh_cert_path: "" labels: {} taints: [] - address: 192.168.1.8 port: "22" internal_address: "" role: - controlplane - worker - etcd hostname_override: node3 user: centos docker_socket: /var/run/docker.sock ssh_key: "" ssh_key_path: ~/.ssh/id_ed25519 ssh_cert: "" ssh_cert_path: "" labels: {} taints: [] services: etcd: image: "" extra_args: {} extra_binds: [] extra_env: [] win_extra_args: {} win_extra_binds: [] win_extra_env: [] external_urls: [] ca_cert: "" cert: "" key: "" path: "" uid: 1000 gid: 1000 snapshot: true retention: 48h creation: 6h backup_config: interval_hours: 12 retention: 6 kube-api: image: "" extra_args: {} extra_binds: [] extra_env: [] win_extra_args: {} win_extra_binds: [] win_extra_env: [] service_cluster_ip_range: 10.43.0.0/16 service_node_port_range: "" pod_security_policy: false always_pull_images: false secrets_encryption_config: enabled: true audit_log: enabled: true admission_configuration: null event_rate_limit: null kube-controller: image: "" extra_args: {} extra_binds: [] extra_env: [] win_extra_args: {} win_extra_binds: [] win_extra_env: [] cluster_cidr: 10.42.0.0/16 service_cluster_ip_range: 10.43.0.0/16 scheduler: image: "" extra_args: {} extra_binds: [] extra_env: [] win_extra_args: {} win_extra_binds: [] win_extra_env: [] kubelet: image: "" extra_args: max-pods: 150 extra_binds: [] extra_env: [] win_extra_args: {} win_extra_binds: [] win_extra_env: [] cluster_domain: cluster.local infra_container_image: "" cluster_dns_server: 10.43.0.10 fail_swap_on: false generate_serving_certificate: false kubeproxy: image: "" extra_args: {} extra_binds: [] extra_env: [] win_extra_args: {} win_extra_binds: [] win_extra_env: [] network: plugin: canal options: {} mtu: 0 node_selector: {} update_strategy: null authentication: strategy: x509 sans: [] webhook: null addons: "" addons_include: - ./dashboard/k8s-dash-recommended.yml - ./dashboard/dashboard-adminuser.yml system_images: etcd: rancher/coreos-etcd:v3.4.3-rancher1 alpine: rancher/rke-tools:v0.1.64 nginx_proxy: rancher/rke-tools:v0.1.64 cert_downloader: rancher/rke-tools:v0.1.64 kubernetes_services_sidecar: rancher/rke-tools:v0.1.64 kubedns: rancher/k8s-dns-kube-dns:1.15.2 dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.2 kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.2 kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1 coredns: rancher/coredns-coredns:1.6.9 coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1 nodelocal: rancher/k8s-dns-node-cache:1.15.7 kubernetes: rancher/hyperkube:v1.18.6-rancher1 flannel: rancher/coreos-flannel:v0.12.0 flannel_cni: rancher/flannel-cni:v0.3.0-rancher6 calico_node: rancher/calico-node:v3.13.4 calico_cni: rancher/calico-cni:v3.13.4 calico_controllers: rancher/calico-kube-controllers:v3.13.4 calico_ctl: rancher/calico-ctl:v3.13.4 calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4 canal_node: rancher/calico-node:v3.13.4 canal_cni: rancher/calico-cni:v3.13.4 canal_flannel: rancher/coreos-flannel:v0.12.0 canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4 weave_node: weaveworks/weave-kube:2.6.4 weave_cni: weaveworks/weave-npc:2.6.4 pod_infra_container: rancher/pause:3.1 ingress: rancher/nginx-ingress-controller:nginx-0.32.0-rancher1 ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1 metrics_server: rancher/metrics-server:v0.3.6 windows_pod_infra_container: rancher/kubelet-pause:v0.1.4 ssh_key_path: ~/.ssh/id_ed25519 ssh_cert_path: "" ssh_agent_auth: false authorization: mode: rbac options: {} ignore_docker_version: false kubernetes_version: "" private_registries: [] ingress: provider: nginx options: use-forwarded-headers: "true" proxy-body-size: "80M" use-http2: "true" node_selector: {} extra_args: {} dns_policy: "" extra_envs: [] extra_volumes: [] extra_volume_mounts: [] update_strategy: null cluster_name: "test" cloud_provider: name: "" prefix_path: "" win_prefix_path: "" addon_job_timeout: 0 bastion_host: address: "" port: "" user: "" ssh_key: "" ssh_key_path: "" ssh_cert: "" ssh_cert_path: "" monitoring: provider: "" options: {} node_selector: {} update_strategy: null replicas: null restore: restore: false snapshot_name: "" dns: provider: coredns upstreamnameservers: - 192.168.1.2 - 192.168.1.3 ```
Steps to Reproduce:
Check PiHole DNS logs.
Results:
960+ registry-1.docker.io and 580+ auth.docker.io DNS queries within the last 24 hours.