nephio-project / nephio

Nephio is a Kubernetes-based automation platform for deploying and managing highly distributed, interconnected workloads such as 5G Network Functions, and the underlying infrastructure on which those workloads depend.
Apache License 2.0
93 stars 52 forks source link

Support for insecure container registries in ansible provisioning for demo environment #754

Open dgeorgievski opened 1 week ago

dgeorgievski commented 1 week ago

I am trying to provision Nephio v2.0.0 on a pre-provisioned VM following the steps provided in the project doc https://docs.nephio.org/docs/guides/install-guides/#installing-on-a-pre-provisioned-vm

My challenge is that without configuring insecure container registry for both VM Docker daemon and Kind containerd the provisioning of the management Kind cluster fails rather quickly due to docker hub API request limit.

Using DOCKERHUB_USERNAME and DOCKERHUB_TOKEN takes me little further down the provisioning path, but not far. The procedure usually breaks with gitea StatefulSet

 Warning  Failed     4m43s (x4 over 6m7s)   kubelet            Failed to pull image "gitea/gitea:1.19.3": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/gitea/gitea:1.19.3": failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/gitea/gitea/manifests/sha256:e5da757cc2bba24216c1874a26e83e3b15421f2526a41d6c4c24cd399cca647b: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

I had to re-configure both VM Docker daemon config and management (kind-kind) containerd's config to provision successfully the management cluster with all of the required services.

Docker daemon config

The following is the desired VM docker daemon config - /etc/docker/daemon.json - that uses local, insecure registry to pull all containers.

{
  "registry-mirrors": [
    "https://docker.example.com"
  ],
  "insecure-registries": [
    "docker.example.com"
  ]
}

The current playbook role sets up the registry-mirror but not the insecure-registries section https://github.com/nephio-project/test-infra/blob/v2.0.0/e2e/provision/playbooks/cluster.yml#L48-L52

Kind containerd config

The current CRI config is using the old, pre v2.0, syntax which is not working in my setup. https://github.com/containerd/containerd/blob/main/docs/cri/registry.md

https://github.com/nephio-project/test-infra/blob/v2.0.0/e2e/provision/playbooks/roles/bootstrap/tasks/create-mgmt.yml#L36-L39 file /etc/containerd/config.toml in kind-control-plane container

 {% if lookup('ansible.builtin.env', 'DOCKER_REGISTRY_MIRRORS') %}
      containerdConfigPatches:
        - |-
          [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
            endpoint = {{ lookup('ansible.builtin.env', 'DOCKER_REGISTRY_MIRRORS') | from_json }}
 {% endif %}

The desired CRI config should look like this following the new syntax
```ansible
containerdConfigPatches:
        - |-
          [plugins."io.containerd.grpc.v1.cri".registry]
            config_path = "/etc/containerd/certs.d"

Then, an additional ansible task could create a default CRI config for using local, insecure container registry.

$ docker exec -i kind-control-plane tree /etc/containerd/certs.d/
/etc/containerd/certs.d/
`-- _default
    `-- hosts.toml

$ docker exec -i kind-control-plane cat /etc/containerd/certs.d/_default/hosts.toml
 [host."https://docker.example.com"]
      capabilities = ["pull", "resolve"]
      skip_verify = true

Beside these changes, I'd recommend updating the versions of the following galaxy ansible roles

  1. andrewrothstein.docker_engine from 0.2.2 to 0.3.0 https://github.com/nephio-project/test-infra/blob/v2.0.0/e2e/provision/galaxy-requirements.yml#L12-L13

  2. andrewrothstein.kind from 1.2.6 to 1.2.11 https://github.com/nephio-project/test-infra/blob/v2.0.0/e2e/provision/galaxy-requirements.yml#L16-L17

I understand this is a request that has significant design implications that need to be compliant with the project goals. I have a working version that you could review as an option to address this issue. The fact is I cannot deploy Nephio Kind demo environment without container mirrors. I prefer insecure mirrors to avoid the overhead of setting-up custom SSL certificates.

Docker daemon config https://github.com/dgeorgievski/test-infra/blob/e2e-provision-bootstrap/e2e/provision/playbooks/cluster.yml#L49-L59

Containerd registry config

  1. Always have registry config https://github.com/dgeorgievski/test-infra/blob/e2e-provision-bootstrap/e2e/provision/playbooks/roles/bootstrap/tasks/create-mgmt.yml#L36-L39

  2. Additional ansible task to setup the default containerd registry configuration https://github.com/dgeorgievski/test-infra/blob/e2e-provision-bootstrap/e2e/provision/playbooks/roles/bootstrap/tasks/create-mgmt.yml#L73-L95

Please note that I am using additional bash env var called DOCKER_INSECURE_REGISTRIES to explicitly define the insecure registry. The call to init.sh in my case would look like this

sudo NEPHIO_DEBUG=true   \
    NEPHIO_BRANCH=v2.0.0 \
    NEPHIO_USER=ubuntu   \
    DOCKER_REGISTRY_MIRRORS="https://docker.example.com" \
    DOCKER_INSECURE_REGISTRIES="docker.example.com" 
    DOCKERHUB_USERNAME=dgeorgievski \
    DOCKERHUB_TOKEN=dckr_pat_token \
     bash init.sh