rh-mobb / ecr-secret-operator

3 stars 4 forks source link

Operator can't find auth key #17

Open Crylion opened 3 weeks ago

Crylion commented 3 weeks ago

Hey! I have been trying to get this operator to work in an openshift 4.14 cluster for a while now, with no luck. For some reason I don't yet understand, the operator keeps having issues getting the AWS SDK to authenticate:

ERROR Reconciler error {"controller": "secret", "controllerGroup": "ecr.mobb.redhat.com", "controllerKind": "Secret", "Secret": {"name":"ecr-secret","namespace":"dsw21-plattform-test-api-layer"}, "namespace": "dsw21-plattform-test-api-layer", "name": "ecr-secret", "reconcileID": "d2ed87d8-39b3-4a16-be15-ad0c74df64bd", "error": "NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"}

I have spent a lot of time checking and re-checking everything according to the documentation, reading your source code to see where the error is coming from and testing the availability of the auth secret in my namespace in other pods etc. and I still can't find any reason the credentials should somehow not be visible to the operator.

In my search for an answer, I followed the breadcrumbs through your source code and noticed a discrepancy in how the auth secret is pulled into your pod though:

You are trying to use the secret for setting the ENV variables for die AWS SDK AND to use it as a config file, which simply does not work. Notice this section:

- name: AWS_ACCESS_KEY_ID
  valueFrom:
    secretKeyRef:
      key: aws_access_key_id
      name: aws-ecr-cloud-credentials
      optional: true
- name: AWS_SECRET_ACCESS_KEY
  valueFrom:
    secretKeyRef:
      key: aws_secret_access_key
      name: aws-ecr-cloud-credentials
      optional: true

Here the aws_access_key_id and aws_secret_access_key should theoretically be pulled from the secret. But the same secret also gets mounted as a volume to use as a config file here:


volumes:
  - name: aws-credentials
    secret:
      optional: true
      secretName: aws-ecr-cloud-credentials

And your documentation suggests using this command to create the secret:

cat <<EOF > /tmp/credentials
[default]
aws_access_key_id=""
aws_secret_access_key=""
EOF

oc create secret generic aws-ecr-cloud-credentials --from-file=credentials=/tmp/credentials

But that would create a secret with only one top level key, credentials, so the two ENV variables that try to pull the keys aws_access_key_id and aws_secret_access_key from the same secret will be empty.

The only way I found to ensure that BOTH work is by creating the secret like in your documentation and then editing it so that it contains the auth information both as the file under credentials and then also as the two aforementioned keys for the ENV. But even then, when the auth information should be accessible as both the aws config file AND the two ENV variables, I still only get the error I posted at the top :/ At this point I have no idea what else I could try, do you have any suggestions?

michaelryanmcneill commented 3 weeks ago

Hi there! Are you deploying this on an OpenShift cluster or a ROSA cluster? If it is an OpenShift cluster, what mode is the Cloud Credential Operator running in?

Crylion commented 1 week ago

@michaelryanmcneill Hey, sorry for the late reply. It's an OpenShift Cluster 4.14. Regarding the mode, you are asking about, I'm unsure. We are not the admins of this cluster and only asked the admins to install this cluster for us and help as by providing some logs etc. But I'll do my best to answer your questions if you can explain what information you need.

michaelryanmcneill commented 6 days ago

@Crylion you can determine the mode by following this documentation. I'm expecting that you're running in Mint or Passthrough mode which will not work with AWS STS and IRSA. Because of that, you'd need to provide static credentials (an IAM access and secret key) instead of a role to assume in the secret. We do not test this operator against clusters that are using Mint or Passthrough mode, so there may be unexpected behaviors that we haven't identified yet.