Request for guidance - pre-caching large container images on AWS AMIs for worker nodes

plaformsre commented 1 year ago

/kind feature

1. Describe IN DETAIL the feature/behavior/change you would like to see. I would need your guidance on the best approach on how to pre-install container.d on the KOPS K8S worker node's AWS AMI (e.g. Ubuntu) so we can pre-cache large container images that might take a significant time for users. We would pull containers during the AMI creation/baking process using e.g. ctr.

2. Feel free to provide a design supporting your feature request. we would need to know, if we could pre-install container.d without making the AMI failing when the worker node joins the Kubernetes cluster and boostraps. We would like to bring the cached layers on container.d on the AMI before the new worker node joins the cluster. This would significantly improve the user experience with large session images.

I worry, if the pre-installation of container.d and eventually not matching the correct version of container.d might cause future issues when the new worker nodes joins the K8S clyster and boostraps.

I would appreciate your guidance how best to approach worker nodes with pre-caching container.d container images.

Thank you.

hakman commented 1 year ago

I would suggest to bundle the container images as .tar files and use additionalUserData to load the images on boot:

ctr -n k8s.io image import <bundled_image.tar>

johngmyers commented 1 year ago

You can get a list of the container images to bake into your AMIs with kops get assets

plaformsre commented 1 year ago

Hi @hakman ,

With that approach we would be still adding additional time to transfer the compressed files onto every single worker node and again, that would take in our case more than 5 minutes (for a new node).

Hi @johngmyers, We are trying to avoid any additional time we would need to do when the new worker node gets provisioned. We are trying to get away with complexities (and additional storage costs) of warm pools, as an example.

Could we pre-install container.d for KOPS on the AMIs, so we have already all pulled images on the worker node?

Any other approaches would be appreciated, but we cannot afford transferring any data over the network (I am referring of about several 10s of GBs of compressed container image layers).

hakman commented 1 year ago

Hi @hakman , With that approach we would be still adding additional time to transfer the compressed files onto every single worker node and again, that would take in our case more than 5 minutes (for a new node).

How did you reach this conclusion? I suggested to put the container images on your custom AMI, and just import them at boot time.

plaformsre commented 1 year ago

Hi @hakman,

I misunderstood you with the duration of pulling the container images.

It seems that during the execution of ctr images export(utilising the ctr command) the containerd has to be running / being setup [1] as I am expected to run containerd ('catch 22'). Those would be 'plain' AMIs that are bootstrapped by KOPS, so we do not pre-install / pre-configure containerd.

If we pre-bake containerd, that iswould be fine, if KOPS would support this and not cause any issues. If we need to install containerd, cache the images on the AMIs (bake), and uninstall containerd, that could be an option as well, if they would be acknowledge by KOPS provisioned containerd.

It would be great, if KOPS would support 'bring your own' containerd installation. Would the latter be possible?

__

[1] error message: ctr: failed to dial "run/containerdcontainerd.dock" /.../

hakman commented 1 year ago

It seems that during the execution of ctr images IMPORT (utilising the ctr command) the containerd has to be running / being setup [1] as I am expected to run containerd ('catch 22').

Scripts will be run in alphabetical order as documented here. Name your additionalUserData script as zzzzzzz and it will run after kOps configures contained.

It would be great, if KOPS would support 'bring your own' containerd installation. Would the latter be possible?

It is possible, I am sure skipInstall could help. Generally helps to look into what config options are available.

plaformsre commented 1 year ago

Hi @hackman,

it seems that there had been an issue/feature request opened and now closed that containers is still being downloaded and extracted even when using ‘skipInstall’ set to true on the containerd configuration. I will test and see, if cached images are left intact.

https://github.com/kubernetes/kops/issues/15558

wlawton commented 8 months ago

I have a similar use case to @plaformsre in that I need my nodes to launch and startup as quickly as possible due to a sudden and rapid scaling of pods which requires a corresponding rapid scale-out of host nodes. To facilitate this we are using the warm pool feature of AWS autoscaling groups and have configured additionalUserData in the kops configuration to install nerdctl and pull images from AWS ECR. Our use case is not suitable for pre-loading a 'golden' AMI with the images since the image tags are highly dynamic. The image tags are sourced from AWS System Manager parameters, which are accessed via aws cli commands in the additionalUserData script.

If our additionalUserData script executes before nodeup.sh then we run into the containerd not installed yet problem, as highlighted by @plaformsre. If the additionalUserData script executes after nodeup.sh, the first nerdctl pull requests starts and appears to be going well according to cloud-init-output.log, but is then interrupted with a timeout error on /run/containerd/containerd.sock. My speculation is that this has occurred because the containerd process has been stopped as part of the node shut down process, which is happening because the nodeup.sh script completed the warm-pool lifecycle event indicating that the AWS ASG can now stop the warmed node.

Here is the relevant snipped from the log:

...
layer-sha256:5c12815fee558b157dee7f7509dedbaba0a8379098858a65ec869e1f1526ea0c:    done           |ESC[32m++++++++++++++++++++++++++++++++++++++ESC[0m| 
elapsed: 1.0 s                                                                    total:  11.0 M (11.0 MiB/s)                                      
time="2023-12-21T13:36:31Z" level=fatal msg="connection error: desc = \"transport: error while dialing: dial unix:///run/containerd/containerd.sock: timeout\": unavailable"
time="2023-12-21T13:36:31Z" level=fatal msg="cannot access containerd socket \"/run/containerd/containerd.sock\": no such file or directory"
...

Its not the first time i've had a problem with this aspect of the system. I previously logged this issue https://github.com/kubernetes/kops/issues/14391 for which the solution was to name my additional userdata script appropriately so that it executed before the nodeup.sh script. However this was when kubernetes supported the docker container runtime, so containerd not being installed yet was not an issue. Now i'm stuck between a rock and a hard place in terms of when my additional user data script is executed.

wlawton commented 8 months ago

I've just experimented by having my additionalUserData script install containerd, run the nerdctl pull commands and then uninstalling containerd again and it caused the nodeup process to get stuck at the point when it needs to restart containerd.service due to it having masked state. So I went back to my additionalUserData script and added systemctl unmask containerd after apt remove -y containerd and subsequently the nodeup process runs through to completion. So i'm hoping its a viable workaround and there are no side effects.

apt-get update
apt install -y containerd   # Required for nerdctl to work, containerd not installed until nodeup runs
..... 
apt remove -y containerd
systemctl unmask containerd

diversario commented 8 months ago

We got it to work this way (this is ansible, but easy to translate into something else):


- name: Install nerdctl
  unarchive:
    src: "https://github.com/containerd/nerdctl/releases/download/v{{ version }}/nerdctl-{{ version }}-linux-{{ cpu_arch }}.tar.gz"
    dest: /usr/local/bin
    remote_src: true
  vars:
    version: 1.7.0

- name: Configure docker ECR auth
  shell: |
    mkdir -p /root/.docker
    echo '{"credsStore": "ecr-login"}' > /root/.docker/config.json

- name: Pre-pull container images
  loop: "{{ images }}"
  shell: |
    nerdctl -n k8s.io image pull {{ item }}
  vars:
    images:
    # if container spec uses sha you must specify it here as well
    # tag without the sha is optional in that case
    # check the actual pods to see what's being used
    - quay.io/cilium/cilium:v1.12.7
    - quay.io/cilium/cilium:v1.12.7@sha256:8cb6b4742cc27b39e4f789d282a1fc2041decb6f5698bfe09112085a07b1fd61

    # etc

# kops will install containerd and stuff
- name: Remove Docker
  apt:
    pkg:
    - docker.io
    - amazon-ecr-credential-helper
    state: absent
    autoremove: yes

- name: Unconfigure docker ECR auth
  file:
    path: /root/.docker
    state: absent

# uninstalling containerd makes the service "masked", meaning it won't start
- name: Unmask containerd service
  shell: |
    systemctl unmask containerd

This is with kops 1.26.3.

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 3 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kops/issues/15696#issuecomment-2170408519): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / kops

Request for guidance - pre-caching large container images on AWS AMIs for worker nodes #15696