upmc-enterprises / registry-creds

Allow for AWS ECR, Google Registry, & Azure Container Registry credentials to be refreshed inside your Kubernetes cluster via ImagePullSecrets
Other
346 stars 123 forks source link

Cannot pull images from AWS ECR: no basic auth credentials (v0.27.0 minikube) #65

Closed ptaillard closed 6 years ago

ptaillard commented 6 years ago

Environment:

What happened: I cannot pull images from the ECR registry: "no basic auth credentials" error

What you expected to happen: I expected to pull the image from the ECR registry after having configured registry-creds with my ID, KEY, TOKEN and AWS Region, and activating the registry-creds addon and using PullSecrets

How to reproduce it (as minimally and precisely as possible): minikube start minikube addons configure registry-creds => configure only with AWS ECR minikube addons enable registry-creds kubectl create -f deployment.yaml => The error occured: cannot start the container due to no basic auth credentials error.

kubectl get secrets --all-namespaces => we can see that the secret created is in kube-system and called registry-creds-ecr. I never found the awsecr-cred name for the secret as mentioned in the documentation https://github.com/upmc-enterprises/registry-creds

deployement.yaml content:

apiVersion: extensions/v1beta1 kind: Deployment metadata: name: deployment spec: replicas: 1 template: metadata: labels: app: spec: containers:

  • name: adserver-test image: .dkr.ecr.us-east-1.amazonaws.com/:latest command: ["/bin/bash"] env:
  • name: TMN_ENVIRONMENT value: "qa" imagePullSecrets:
  • name: registry-creds-ecr

Output of minikube logs (if applicable):

May 23 09:53:31 minikube kubelet[3443]: W0523 09:53:31.388519 3443 kubelet_pods.go:878] Unable to retrieve pull secret default/registry-creds-ecr for default/adserver-deployment-654f4668bf-l97n8 due to secrets "registry-creds-ecr" not found. The image pull may not succeed.

May 23 09:53:31 minikube kubelet[3443]: I0523 09:53:31.388628 3443 kuberuntime_manager.go:513] Container {Name:adserver-test Image:.dkr.ecr.us-east-1.amazonaws.com/adserver:latest Command:[/bin/bash] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:TMN_ENVIRONMENT Value:qa ValueFrom:nil}] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-27gpt ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.

May 23 09:53:32 minikube kubelet[3443]: E0523 09:53:32.229556 3443 remote_image.go:108] PullImage ".dkr.ecr.us-east-1.amazonaws.com/adserver:latest" from image service failed: rpc error: code = Unknown desc = Error response from daemon: Get https://.dkr.ecr.us-east-1.amazonaws.com/v2/adserver/manifests/latest: no basic auth credentials

May 23 09:53:32 minikube kubelet[3443]: E0523 09:53:32.229585 3443 kuberuntime_image.go:51] Pull image ".dkr.ecr.us-east-1.amazonaws.com/adserver:latest" failed: rpc error: code = Unknown desc = Error response from daemon: Get https://.dkr.ecr.us-east-1.amazonaws.com/v2/adserver/manifests/latest: no basic auth credentials

May 23 09:53:32 minikube kubelet[3443]: E0523 09:53:32.229627 3443 kuberuntime_manager.go:733] container start failed: ErrImagePull: rpc error: code = Unknown desc = Error response from daemon: Get https://.dkr.ecr.us-east-1.amazonaws.com/v2/adserver/manifests/latest: no basic auth credentials

May 23 09:53:32 minikube kubelet[3443]: E0523 09:53:32.229648 3443 pod_workers.go:186] Error syncing pod 1d7cad94-5e6f-11e8-962c-0800278cf469 ("adserver-deployment-654f4668bf-l97n8_default(1d7cad94-5e6f-11e8-962c-0800278cf469)"), skipping: failed to "StartContainer" for "adserver-test" with ErrImagePull: "rpc error: code = Unknown desc = Error response from daemon: Get https://.dkr.ecr.us-east-1.amazonaws.com/v2/adserver/manifests/latest: no basic auth credentials"

erstaples commented 6 years ago

I'm having a similar issue with ECR creds on minikube v0.24.1 (registry-creds image upmcenterprises/registry-creds:1.8).

The first time it happened, after trying to disable/re-enable registry-creds, I decided to minikube delete, then nuke the ~/.minikube directory and restart minikube with a clean slate.

I then ran minikube addons configure registry-creds, filled in the prompts... minikube addons enable registry-creds

The initial logs I saw when the registry-creds pod came up:

2018/05/30 18:58:54 Starting up...
2018/05/30 18:58:54 Using AWS Account: <accountid>
2018/05/30 18:58:54 Using AWS Region: us-east-1
2018/05/30 18:58:54 Using AWS Assume Role:
2018/05/30 18:58:54 Refresh Interval (minutes): 60
time="2018-05-30T18:58:54Z" level=info msg="Using InCluster k8s config"
2018/05/30 18:58:54 Refreshing credentials...
time="2018-05-30T18:58:54Z" level=info msg="------------------ [gcr-secret] ----------------------
"
time="2018-05-30T18:58:54Z" level=info msg="Error getting secret for provider gcr-secret. Skipping secret provider! [Err: google: error getting credentials using well-known file (/root/.config/gcloud/application_default_credentials.json): invalid character 'c' looking for beginning of value]"
time="2018-05-30T18:58:54Z" level=info msg="------------------ [awsecr-cred] ----------------------
"
time="2018-05-30T18:58:54Z" level=info msg="------------------ [dpr-secret] ----------------------
"
time="2018-05-30T18:58:54Z" level=error msg="Error getting secret: secrets "awsecr-cred" not found"
2018/05/30 18:58:54 Finished processing secret for namespace default, secret awsecr-cred
time="2018-05-30T18:58:54Z" level=error msg="Error getting secret: secrets "dpr-secret" not found"
2018/05/30 18:58:54 Finished processing secret for namespace default, secret dpr-secret
2018/05/30 18:58:54 Refreshing credentials...
time="2018-05-30T18:58:54Z" level=info msg="------------------ [gcr-secret] ----------------------
"
time="2018-05-30T18:58:54Z" level=info msg="Error getting secret for provider gcr-secret. Skipping secret provider! [Err: google: error getting credentials using well-known file (/root/.config/gcloud/application_default_credentials.json): invalid character 'c' looking for beginning of value]"
time="2018-05-30T18:58:54Z" level=info msg="------------------ [awsecr-cred] ----------------------
"
time="2018-05-30T18:58:54Z" level=info msg="------------------ [dpr-secret] ----------------------
"
2018/05/30 18:58:54 Refreshing credentials...
time="2018-05-30T18:58:54Z" level=info msg="------------------ [gcr-secret] ----------------------
"
time="2018-05-30T18:58:54Z" level=info msg="Error getting secret for provider gcr-secret. Skipping secret provider! [Err: google: error getting credentials using well-known file (/root/.config/gcloud/application_default_credentials.json): invalid character 'c' looking for beginning of value]"
time="2018-05-30T18:58:54Z" level=info msg="------------------ [awsecr-cred] ----------------------
"
time="2018-05-30T18:58:54Z" level=info msg="------------------ [dpr-secret] ----------------------
"
time="2018-05-30T18:58:54Z" level=error msg="Error getting secret: secrets "awsecr-cred" not found"
2018/05/30 18:58:55 Finished processing secret for namespace kube-public, secret awsecr-cred
time="2018-05-30T18:58:55Z" level=error msg="Error getting secret: secrets "dpr-secret" not found"
2018/05/30 18:58:55 Finished processing secret for namespace kube-public, secret dpr-secret

I deployed an app that uses our private ECR registry, and voila, it worked. I then rebuilt the image and pushed it to my ECR repo with a new tag, and re-deployed my app to the minikube cluster.

After that I got the dreaded ImagePullBackoff error, and started seeing these errors in kubectl describe po <podname>:

  Warning  Failed                 6s    kubelet, minikube  Failed to pull image "<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/<repo>/reporting-nginx:7f3par7": rpc error: code = Unknown desc = Error response from daemon: Get https://<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/v2/<repo>/reporting-nginx/manifests/7f3par7: no basic auth credentials
  Warning  Failed                 6s    kubelet, minikube  Error: ErrImagePull
  Normal   Pulling                6s    kubelet, minikube  pulling image "<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/<repo>/reporting-phpfpm:7f3par7"
  Warning  Failed                 6s    kubelet, minikube  Failed to pull image "<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/<repo>/reporting-phpfpm:7f3par7": rpc error: code = Unknown desc = Error response from daemon: Get https://<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/v2/<repo>/reporting-phpfpm/manifests/7f3par7: no basic auth credentials
  Warning  Failed                 6s    kubelet, minikube  Error: ErrImagePull
  Normal   BackOff                6s    kubelet, minikube  Back-off pulling image "<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/<repo>/reporting-nginx:7f3par7"
  Warning  Failed                 6s    kubelet, minikube  Error: ImagePullBackOff
  Normal   BackOff                6s    kubelet, minikube  Back-off pulling image "<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/<repo>/reporting-phpfpm:7f3par7"
  Warning  Failed                 6s    kubelet, minikube  Error: ImagePullBackOff

I also deployed the same image and tag to a KOPS cluster and it pulled the image just fine, so I know the image tag exists.

At this point, there are no new logs in registry creds to help diagnose the issue, and there appears to be no verbosity option to pass to the image to help debug. It works the first time, fails the second time. I'm wondering if it has something to do with this log line:

time="2018-05-30T18:58:54Z" level=error msg="Error getting secret: secrets "awsecr-cred" not found"

Strange to see this considering the name of the secret that minikube addons configure registry-creds creates is actually called registry-creds-ecr.

nicroto commented 6 years ago

I am facing the same issue as @erstaples.

@stevesloka do you have any ideas what may've gone wrong? I see a lot of Pull Requests with reasonable changes (the docs changes with info for minikube setup seems quite useful, for example) - is this repo still being supported/developed?

stevesloka commented 6 years ago

Hey, @nicroto yes this repo is still maintained, just hasn't needed many updates recently.

Let me give this a shot, something might have changed upstream with the aws sdk, but I doubt that's really the issue. I had someone else recently use this on docker-for-mac's k8s integration and it worked.

To confirm you're doing the same steps that @erstaples did to first configure, then enable?

nicroto commented 6 years ago

Hey @stevesloka, thanks for the quick reply.

Yes, I am doing the same thing.

minikube delete
minikube start
minikube addons configure registry-creds
# then I would enter my creds from AWS
minikube addons enable registry-creds

Then I would install a helm chart which has a deployment.yaml looking roughly like this:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: {{ template "app.fullname" . }}
  labels:
    app: {{ template "app.name" . }}
    ...
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    metadata:
      labels:
        app: {{ template "app.name" . }}
        release: {{ .Release.Name }}
    spec:
      imagePullSecrets:
        - name: {{ .Values.image.pullSecrets }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - containerPort: {{ .Values.service.internalPort }}
          livenessProbe:
            httpGet:
              path: /
              port: {{ .Values.service.internalPort }}
            initialDelaySeconds: 240
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: {{ .Values.service.internalPort }}
          resources:
            ...
      nodeSelector:
        ...

and in my values.yaml file I would have:

replicaCount: 1
image:
  repository: <omitted my account id>.dkr.ecr.us-east-1.amazonaws.com/<omitted repo name>
  tag: latest
  pullPolicy: IfNotPresent
  pullSecrets: awsecr-cred
...

If it does work on your end - maybe we are making some kind of mistake when entering the creds? Here is what I do once the configure command is called on minikube, for each and every entry:

  1. -- Enter AWS Access Key ID:
    • For that I would go to my AWS Developer Console;
    • IAM;
    • Users;
    • click on specific user (with all required permissions enabled/attached to it);
    • Security Credentials Tab;
    • Create Access Key;
    • copy the id.
  2. -- Enter AWS Secret Access Key:
    • From the same generated key, I would click "Show" on the Secret value and copy it.
  3. -- (Optional) Enter AWS Session Token:
    • I don't enter/paste anything, just press Return.
  4. -- Enter AWS Region:
    • I enter us-east-1a (both my ECR repo and my cluster are located in this region).
  5. -- Enter 12 digit AWS Account ID (Comma seperated list):
    • From the top menu I would click on my username and in the dropdown I choose "My Security Credentials";
    • Then I click Continue To Security Credentials (in the dialog box that shows up);
    • Then I expand the "Account Identifiers" pane in the accordion/panelbar widget;
    • Then I copy the "AWS Account ID" and replace the dashes with commas (I've tested with both dashes and commas - no change):
      • xxxx-xxxx-xxxx -> xxxx,xxxx,xxxx
    • and I paste that in the CLI.
  6. -- (Optional) Enter ARN of AWS role to assume:
    • I go to IAM service;
    • Users;
    • Click on the same user I generated the Access Key on;
    • And I just copy the value from the "User ARN" field.
  7. I decline to set up GCE and private docker registry.
nicroto commented 6 years ago

@stevesloka Did you manage to check this out?

stevesloka commented 6 years ago

Sorry I upgraded minikube and now latest doesn't work. I'll keep troubleshooting, if not I have an older build which should work.

By the way, what version of minikube are you using? 0.27?

nicroto commented 6 years ago

Thanks. I am currently using 0.26.1. What is the latest version that it works on? I think I am using a feature that isn't available on an earlier version... but I am not sure what that was.

ahanoff commented 6 years ago

minikube v0.28.0 is working fine. Just put to your deployment awsecr-cred instead of registry-creds-ecr

imagePullSecrets:

  • name: awsecr-cred
ahanoff commented 6 years ago

@nicroto I didn't get your step 5 in addon configuration. What dashes in your account id? Account id is just 12 numbers, so just type xxxxxxxxxxxx, but for few accounts you can split them using commas

nicroto commented 6 years ago

@ahanoff Maybe this is it. That is why I posted every detail of my setup, so a mistake can be ruled out. Will check it out and come back with more info. Thanks.

sylvain-rouquette commented 6 years ago

@ahanoff doesn't work for me, v0.28.2 with awsecr-cred. I have this log:

Unable to retrieve pull secret default/awsecr-cred for default/data-service-7ccb57c46d-662h7 due to secrets "awsecr-cred" not found

ahanoff commented 6 years ago

@sylvain-rouquette can you check if this secret exists using kubectl? It shoud be in kube-system namespace

sylvain-rouquette commented 6 years ago

@ahanoff I have registry-creds-ecr running in kube-system, but I get the same error if I set this for imagePullSecrets.

isn't the problem the "default/" at the beginning, shouldn't it be "kube-system/" instead?

edit: I checked the content of registry-creds-ecr and it seems correctly configured.

kubectl get secret registry-creds-ecr --output=yaml --namespace=kube-system

edit2: it seems the problem could be in the addon:

kubectl logs registry-creds-x4sfq --namespace=kube-system

"caused by: Post https://ecr.eu-west-1.amazonaws.com/: dial tcp: lookup ecr.eu-west-1.amazonaws.com on 10.96.0.10:53: read udp 172.17.0.8:33304->10.96.0.10:53: i/o timeout"

edit3: enabling the ingress addon fixed that. But now it says my credentials are invalid. I specified my AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY I use somewhere else.

ahanoff commented 6 years ago

@sylvain-rouquette can you pull image to your local environment using those credentials? Just docker pull. Thanks. So there is either really invalid credentials which is easy to check, or something wrong with setting up registry-creds.

Edit1: name of secret is awsecr-cred, you can search in readme

sylvain-rouquette commented 6 years ago

yes it works locally. now awsecr-cred doesn't show an error anymore. You were right, I had to use awsecr-cred in imagePullSecrets.

But now I have this error: no basic auth credentials.

ahanoff commented 6 years ago

You can try kill pod of registry-creds 😄 or try reconfigure registry creds again. Edit1: based on docs, if you update secrets they should apply, but I'm not sure when (I need read it again). That's why I suggested kill pod

sylvain-rouquette commented 6 years ago

yeah I restarted multiple times minikube, disabling the addon ("disable" is broken right now, so I do it by editing the config file), the pod is re-created after configuring and enabling the addon. But I'll try again to recreate everything from scratch and see.

Thanks your your help :)

edit: it works :)

I had to follow very specific steps in order:

In you Deployment:

      imagePullSecrets:
      - name: awsecr-cred

in your console:

minikube start
minikube addons enable ingress
minikube addons configure registry-creds
minikube addons enable registry-creds
kubectl apply -f deployment.yaml

if you deployed before configuring registry-creds, it won't work, I guess secrets won't be refreshed in the existing pods.

If registry-creds is already enabled and you can't disable it, check in $HOME/.minikube/config and disable it here, and restart minikube.

guemues commented 6 years ago

If i run minikube without any driver it continuous giving this error even ingress addon enabled: "caused by: Post https://ecr.eu-west-1.amazonaws.com/: dial tcp: lookup ecr.eu-west-1.amazonaws.com on 10.96.0.10:53: read udp 172.17.0.8:33304->10.96.0.10:53: i/o timeout"

if i run Minikube with VirtualBox it doesn't give any error.

edit: i understand that it is about dns resolver of minikube: https://github.com/kubernetes/minikube/issues/2302

stevesloka commented 6 years ago

Good to hear you got it working @guemues!

To everyone on this thread, I'm going to close as it seems all issues are ok? If not feel free to open a new one or reopen this one. Thanks!

nicroto commented 6 years ago

I am still getting the "no basic auth credentials", even after following @sylvain-rouquette's procedure and having all tools upgraded to latest AND using my Account ID in "xxxxxxxxxxxx" form.

How can I further debug this to give you more info on what's going wrong, here?

Here is a simplification of my deployment that fails to pull an image from ECR:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: {{ template "chart-name.fullname" . }}
  labels:
    app: {{ template "chart-name.name" . }}
    chart: {{ template "chart-name.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ template "chart-name.name" . }}
      release: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ template "chart-name.name" . }}
        release: {{ .Release.Name }}
    spec:
      imagePullSecrets:
        - name: awsecr-cred
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
nicroto commented 6 years ago

OK, finally got it working. There probably was more than one issue in my case, but after upgrading everything to latest and getting the error I last posted, I checked the logs for the addon pod and I found that it couldn't resolve the aws dns. My account should be assigned to the "us-east-1a", but constructing the dns with the "a" at the end didn't properly resolve.

Changing the region from "us-east-1a" to "us-east-1" resolved the issue with pulling images on my end.

geerlingguy commented 6 years ago

Strange, for me I'm seeing the registry-creds pod failing to start with:

Events:
  Type     Reason                 Age                 From               Message
  ----     ------                 ----                ----               -------
  Normal   Scheduled              50m                 default-scheduler  Successfully assigned registry-creds-blkd6 to minikube
  Normal   SuccessfulMountVolume  50m                 kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-x55v6"
  Warning  FailedMount            19m (x23 over 50m)  kubelet, minikube  MountVolume.SetUp failed for volume "gcr-creds" : secrets "registry-creds-gcr" not found
  Warning  FailedMount            5m (x20 over 48m)   kubelet, minikube  Unable to mount volumes for pod "registry-creds-blkd6_kube-system(6d02fff5-c1b1-11e8-89b5-080027ab9b79)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"registry-creds-blkd6". list of unmounted volumes=[gcr-creds]. list of unattached volumes=[gcr-creds default-token-x55v6]

I'm not trying to use gcr-creds though, so :/

geerlingguy commented 6 years ago

Ah, I found that when I ran minikube addons configure registry-creds, it asked about gcr registry credentials and docker registry credentials as well—when I initially set things up, I created a secrets.yml file with only the cloud: ecr secret, but not the gks or docker ones, so this container must expect all three to be present.

Once I disabled the addon, then ran:

minikube addons configure registry-creds
minikube addons enable registry-creds

I was able to pull images using a format like:

spec:
  template:
    spec:
      containers:
      - name: my-container
        image: ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/ECR_REPO:latest
      imagePullSecrets:
      - name: awsecr-cred
jonasfor commented 1 year ago

minikube v0.28.0 is working fine. Just put to your deployment awsecr-cred instead of registry-creds-ecr

imagePullSecrets:

  • name: awsecr-cred

This work for me, thanks !