k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.17k stars 2.36k forks source link

Validate k3s kubelet image-credential-provider support #3463

Closed brandond closed 1 year ago

brandond commented 3 years ago

Originally posted by @rancher-max in https://github.com/k3s-io/k3s/issues/3280#issuecomment-843644882

I've validated standard airgap testing in v1.21.1-rc1+k3s1. This continues to work with tarball method, private registry in registries.yaml, and now also works with system-default-registry flag.

The image-credential-provider stuff on the kubelet is not working, even with the featuregate turned on. This appears to be an upstream issue, as using the same configurations with wharfie directly works. The error I'm seeing is a 401 Unauthorized error when trying to pull the images. Using config file:

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: ecr-credential-provider-amd64
    matchImages:
    - "*.dkr.ecr.*.amazonaws.com"
    - "*.dkr.ecr.*.amazonaws.cn"
    - "*.dkr.ecr-fips.*.amazonaws.com"
    - "*.dkr.ecr.us-iso-east-1.c2s.ic.gov"
    - "*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
    - get-credentials

With that ecr-credential-provider-amd64 binary pulled from: https://github.com/rancher/wharfie/releases/tag/v0.3.5.

Bringing up k3s with flag: --system-default-registry=<account>.dkr.ecr.<region>.amazonaws.com where all the necessary k3s images are present in that registry.

brandond commented 3 years ago

This may be an issue with the example configuration from the upstream docs, or perhaps with the plugins themselves. There was someone on Users Slack who wrote a shell script wrapper around amazon-ecr-credential-helper and got it working, after reporting that the upstream ECR and GCR plugins are essentially broken at the moment.

brandond commented 3 years ago

I see the correct behavior from the kubelet (plugin is used and auth provided) when using the following config and dummy plugin; I will need to work with @rancher-max to set up another test on ECR to figure out what's going on over there. I suspect the ECR plugin may not be functional yet.

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: test.sh
    matchImages:
    - "docker.io"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
    - get-credentials
    env:
    - name: TEST
      value: "TEST"
#!/bin/bash

date &>> /tmp/credential.log
env &>> /tmp/credential.log
jq . &>> /tmp/credential.log

echo '{
  "kind": "CredentialProviderResponse",
  "apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1",
  "cacheKeyType": "Image",
  "cacheDuration": "5s",
  "auth": {
    "docker.io": {
      "username": "myuser",
      "password": "mypass"
    }
  }
}'
brandond commented 3 years ago

I believe the failure @rancher-max and I saw was due to https://github.com/kubernetes/kubernetes/issues/102750

n4j commented 3 years ago

@brandond Yes your RCA is correct it's due to https://github.com/kubernetes/kubernetes/issues/102750

brandond commented 3 years ago

Moving this back into backlog pending an upstream fix to the kubelet

brandond commented 3 years ago

Fix for the upstream issue didn't make it in to 1.22.0; it looks like it'll probably be in 1.22.1 and backported to the next 1.21 patch release.

snasovich commented 3 years ago

@brandond , is there an update on this?

brandond commented 3 years ago

The issue I linked up above was only fixed on master for 1.23. I've pinged the PR author a couple times both on GH and on Kubernetes Slack but have not made any progress towards getting the fix backported to 1.22 or 1.21: https://github.com/kubernetes/kubernetes/pull/103231

Also, the GCR and ECR plugins are in middling states of usability, and the ACR one isn't due until December...

brandond commented 3 years ago

Upstream has declined to backport the fixes for this to 1.22, as it's an alpha feature. Kubelet credential provider plugins won't be functional until 1.23.0

katran001 commented 2 years ago

@rancher-max Can we retest this on the next milestone?

brandond commented 2 years ago

@rancher-max and I identified another issue with upstream credential provider support. The credential provider plugin is only called to provide credentials to pull images specifically referenced by pod specs. It is not ever called to pull the pause image. This means that the pause image must be available anonymously, or credentials for pulling the pause image must be provided via containerd registries.yaml.

We could theoretically use the --pause-image flag to point k3s at a pause image that can be pulled anonymously, while pulling the rest of the images off the --system-default-registry, but there is a bug in the way that these two flags interact that prevents that from working properly.

rancher-max commented 2 years ago

Moved this to Stalled for now as there isn't much more for me to test, but feel free to change the status when it gets picked up (including if we end up making changes to upstream). Thank you again Brad for your help in figuring out what the issue was here!

katran009 commented 2 years ago

@galal-hussein what is the status of this? Can we bump it to the next stage?

dereknola commented 2 years ago

Upstream is still stalled on general support for this issue.

mkmik commented 1 year ago

FWIW, it got GA'd in 1.26 https://kubernetes.io/blog/2022/12/22/kubelet-credential-providers/

brandond commented 1 year ago

@mkmik yes, but as far as I can tell they still haven't fixed the issue of the credential provider not being consulted for the pause image, so it's not really usable in environments where all images require credentials.

mkmik commented 1 year ago

Would there be value in providing partial support for the feature?

brandond commented 1 year ago

Define "partial support". K3s has supported this (as in, it is in the code base and works) for a while. The functionality will not go away, I just don't find it very useful due to this limitation in how upstream has integrated it.

brandond commented 1 year ago

Feature did not graduate and the feature-gate has been removed starting with 1.28