google / go-containerregistry

Go library and CLIs for working with container registries
Apache License 2.0
3.13k stars 542 forks source link

ggcr: Multiple k8s pull secrets for same registry doesn't work #1431

Open dprotaso opened 2 years ago

dprotaso commented 2 years ago

Describe the bug

See: https://github.com/knative/serving/issues/13126

To Reproduce

See linked issue:

dprotaso commented 2 years ago

cc @jwcesign

imjasonh commented 2 years ago

https://github.com/google/go-containerregistry/pull/1280 was supposed to handle the case where auth is correctly configured for registry.com/foo and registry.com/bar and possibly also just registry.com -- if your request is for /foo, you should get creds for foo, i.e., as specific as possible. If your request is for /baz you get creds for registry.com. This assumes that auth is configured correctly in all cases, and we should just be as specific as possible in selecting among those valid creds.

If there's a bug in that logic though, we should fix it -- the kaniko issue that inspired that change (https://github.com/GoogleContainerTools/kaniko/issues/687) was reopened because the behavior persisted.

This doesn't account for the case where there are two creds configured both for the same registry(+repo), but one is invalid and the other works. It sounds like kubelet will collect all matching auths and try them in a loop, and ggcr (currently) doesn't -- it will choose one at ~random and use it, even if it fails and the other would have worked.

It will be hard, but not impossible, for pkg/v1/remote to handle this fallback scenario, and try creds for /foo then fallback to creds for / if it gets a 403 response. That should match kubelet's behavior linked from the Knative issue above.

jwcesign commented 2 years ago

I did test as follows: first create secret:

root@cesign [12:38:20 AM] [+47.0°C] [~/git/group-service-acceleration] [main *]
-> # kubectl create secret docker-registry jwcesign \
--docker-server=https://index.docker.io/v1/ \
--docker-username=jwcesign \
--docker-password=xxx \
--docker-email=jwcesign@163.com

root@cesign [12:40:14 AM] [+45.0°C] [~/git/group-service-acceleration] [main *]
-> # kubectl create secret docker-registry cesign \
--docker-server=https://index.docker.io/v1/ \
--docker-username=cesign \
--docker-password=xxx \
--docker-email=jwcesign@gmail.com

then create service:

root@cesign [12:38:59 AM] [+47.0°C] [~/git/group-service-acceleration/config] [main *]
-> # cat group-service-c.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: group-service-c
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "1"
    spec:
      containerConcurrency: 1
      timeoutSeconds: 30
      imagePullSecrets:
      - name: jwcesign
      - name: cesign
      containers:
        - image: cesign/secret-test:latest
          env:
            - name: NOW_SERVICE
              value: "group-service-c"
            - name: NEXT_SERVICE
              value: ""

It will failed with 401 when resolving, but if with registries-skipping-tag-resolving, it works; So from this testing, I think just fix resolving with loop check with creds.

It will be hard, but not impossible, for pkg/v1/remote to handle this fallback scenario, and try creds for /foo then fallback to creds for / if it gets a 403 response. That should match kubelet's behavior linked from the Knative issue above.

I think even with this implementation, the problem can't be fixed

cc @imjasonh @dprotaso

Wouter0100 commented 2 years ago

It sounds like kubelet will collect all matching auths and try them in a loop, and ggcr (currently) doesn't -- it will choose one at ~random and use it, even if it fails and the other would have worked.

Correct, see here.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Keep fresh with the 'lifecycle/frozen' label.

jwcesign commented 1 year ago

lifecycle/frozen