keel-hq / keel

Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates
https://keel.sh
Mozilla Public License 2.0
2.46k stars 282 forks source link

ECR IAM Role usage #425

Closed spaghettifunk closed 5 years ago

spaghettifunk commented 5 years ago

I tried using the iam role in the pod annotation to avoid putting credentials in the values.yaml file and push it to GitHub but it seems that keel is not able to pick up the role correctly. I was wondering if somebody else managed to make it work with podsAnnotations rather than plain credentials. If not, I'll be happy to work on a PR 😃

severity1 commented 5 years ago

@spaghettifunk I also want this, based on my tests it seems that even if you set ecr.enabled: false this just doesn't set the aws access, secret and region variables and values as seen below.

env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: POLL
          value: "1"
        - name: HELM_PROVIDER
          value: "1"
        - name: TILLER_NAMESPACE
          value: kube-system
        - name: TILLER_ADDRESS
        - name: BASIC_AUTH_USER
          value: potatoman
        - name: SLACK_CHANNELS
          value: deployment
        - name: SLACK_APPROVALS_CHANNEL
          value: deployment
        - name: SLACK_BOT_NAME
          value: keelbot
        - name: NOTIFICATION_LEVEL
          value: info

tried this and keel can still detect that my deployment is using an ECR repo although its giving the usual 401 Not Authorized as seen below.

time="2019-08-13T04:48:52Z" level=error msg="trigger.poll.manager: got error(-s) while watching images" error="encountered errors while adding images: Get https://blahblah.dkr.ecr.ap-southeast-2.amazonaws.com/v2/banana/bloop/manifests/0.0.6: http: non-successful response (status=401 body=\"Not Authorized\\n\")"

this suggests to me that ecr.enabled just sets variables and values, to me this is a bit misleading. just make aws.enabled.

although this also suggests that even if it can still detect that you are using an ecr repo and goes through the ecr logic, something in there explicitly references to the aws key/val to be present.

my expectation is that, if AWS key/val is not present anywhere always fallback to IAM roles/profiles as this is best practice. see https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html for reference.

When you initialize a new service client without supplying any arguments, the AWS SDK for Java attempts to find AWS credentials by using the default credential provider chain implemented by the DefaultAWSCredentialsProviderChain class. The default credential provider chain looks for credentials in this order:

My go to workaround for now is to store IAM creds into secrets manager and then use my kubernetes-external-secrets to let the pods fetch the secrets during pod creation.

My setup is;

rusenask commented 5 years ago

@spaghettifunk I would really appreciate a PR for this! Could you extend https://github.com/keel-hq/keel/tree/master/extension/credentialshelper/aws package to include this additional logic?

spaghettifunk commented 5 years ago

@rusenask I'd be happy to! I will start working on it on Friday because we need to have the role implemented for our CI/CD system 😄

RiceBowlJr commented 5 years ago

Hi there! I just configured Keel on AWS to poll on ECR. I do not like to put credentials in any app conf, neighter generate a whole lot of user for apps.

I use Kiam to solve this issue and it works well with Keel :)

If this can help, here is what to do:

And you are good to go :)

spaghettifunk commented 5 years ago

Ha! I must confess that I did not expect this. We are currently using kube2iam and keel was not able to pull from ECR. We tried even with Admin rights and still we were having 401 Permission Denied. I'll try again maybe I overlooked something. Thanks!

severity1 commented 5 years ago

i've been using kube2iam as well, i wonder why it doesn't work.

On Sat, 24 Aug 2019, 4:58 AM Davide Berdin, notifications@github.com wrote:

Ha! I must confess that I did not expect this. We are currently using kube2iam and keel was not able to pull from ECR. We tried even with Admin rights and still we were having 401 Permission Denied. I'll try again maybe I overlooked something. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keel-hq/keel/issues/425?email_source=notifications&email_token=AAPB4VSU7ICNNOKWILFAYNLQGAJKNA5CNFSM4IKSL4X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AYQTY#issuecomment-524388431, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPB4VRXM6FA6NZV6BRYNL3QGAJKNANCNFSM4IKSL4XQ .

severity1 commented 5 years ago

@spaghettifunk it is quite possible that KIAM works because it populates the ENV VARS while kube2iam populates the credentials file or just plain IAM Instance Profile, if this is the case the underlying issue still stands that the ecr logic doesnt failover to the credentialhelper lookup and is being blocked by ENV VARS existence.

spaghettifunk commented 5 years ago

Yeah could be. I've never used kiam so I can't really know per-sei. On the other hand, I'm working actively onto a PR for this issue because we don't want to have AWS credentials exposed in our repository. Hopefully by the end of the week I'm able to open the PR with the changes.

flythebluesky commented 5 years ago

Also very interested in using Kube2IAM w/AWS ECR. If Keel uses the AWS Default Credentials Provider chain instead of only environment variables, it should work. We have the same concerns regarding exposing repository credentials. Hat tip to all the devs, this is a really welcome tool!!

spaghettifunk commented 5 years ago

Documentation here: https://keel.sh/docs/#iam-role

jcardoso-bv commented 5 years ago

Is PR #438 part of 0.15.0-rc2? I only ask as we're still unable to get Keel to pull images cross-account from our ECR repos despite all tests within the pod confirming our Kiam roles are working correctly.

rusenask commented 5 years ago

Yes, 0.15.0-rc2 has the latest code from master, @spaghettifunk any ideas?

jcardoso-bv commented 5 years ago

Very odd. Keel works fine when in the same account as our ECR but in a different account that has been given cross-account access via ECR policies and a Kiam role it always spits out:

time="2019-09-10T11:24:08Z" level=error msg="trigger.poll.RepositoryWatcher.addJob: failed to get image digest" error="Get https://xxx.dkr.ecr.eu-west-1.amazonaws.com/v2/xxx/xxx-xxx-service/manifests/1.21.2: http: non-successful response (status=401 body=\"Not Authorized\\n\")" image="xxx/xxx-xxx-service:1.21.2" password= username=
time="2019-09-10T11:24:08Z" level=error msg="trigger.poll.RepositoryWatcher.Watch: failed to add image watch job" error="Get https://xxx.dkr.ecr.eu-west-1.amazonaws.com/v2/xxx/xxx-xxx-service/manifests/1.21.2: http: non-successful response (status=401 body=\"Not Authorized\\n\")" image="namespace:xxx,image:xxx.dkr.ecr.eu-west-1.amazonaws.com/xxx/xxx-xxx-service:1.21.2,provider:kubernetes,trigger:poll,sched:@every 5m,secrets:[]"

An example of tests within the pod confirming the Kiam role and ECR policies are working as intended:

/ # echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> "/etc/apk/repositories"
/ # apk update
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/edge/testing/x86_64/APKINDEX.tar.gz
v3.10.2-41-g6252c54e4f [http://dl-cdn.alpinelinux.org/alpine/v3.10/main]
v3.10.2-38-g39a872f50f [http://dl-cdn.alpinelinux.org/alpine/v3.10/community]
v20190809-2308-g0685aaea7c [http://dl-cdn.alpinelinux.org/alpine/edge/testing]
OK: 14785 distinct packages available
/ # apk add docker groff python3
OK: 353 MiB in 40 packages
/ # rm -fr /var/cache/apk/*
/ # pip3 install awscli
Requirement already satisfied: awscli in /usr/lib/python3.7/site-packages (1.16.232)
Requirement already satisfied: s3transfer<0.3.0,>=0.2.0 in /usr/lib/python3.7/site-packages (from awscli) (0.2.1)
Requirement already satisfied: botocore==1.12.222 in /usr/lib/python3.7/site-packages (from awscli) (1.12.222)
Requirement already satisfied: rsa<=3.5.0,>=3.1.2 in /usr/lib/python3.7/site-packages (from awscli) (3.4.2)
Requirement already satisfied: docutils<0.16,>=0.10 in /usr/lib/python3.7/site-packages (from awscli) (0.15.2)
Requirement already satisfied: PyYAML<=5.2,>=3.10; python_version != "2.6" in /usr/lib/python3.7/site-packages (from awscli) (5.1.2)
Requirement already satisfied: colorama<=0.3.9,>=0.2.5 in /usr/lib/python3.7/site-packages (from awscli) (0.3.9)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/lib/python3.7/site-packages (from botocore==1.12.222->awscli) (0.9.4)
Requirement already satisfied: urllib3<1.26,>=1.20; python_version >= "3.4" in /usr/lib/python3.7/site-packages (from botocore==1.12.222->awscli) (1.25.3)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1; python_version >= "2.7" in /usr/lib/python3.7/site-packages (from botocore==1.12.222->awscli) (2.8.0)
Requirement already satisfied: pyasn1>=0.1.3 in /usr/lib/python3.7/site-packages (from rsa<=3.5.0,>=3.1.2->awscli) (0.4.7)
Requirement already satisfied: six>=1.5 in /usr/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1; python_version >= "2.7"->botocore==1.12.222->awscli) (1.12.0)
You are using pip version 19.0.3, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
/ # aws --region "eu-west-1" ecr get-authorization-token --output text --query 'authorizationData[].authorizationToken' | base64 -d | cut -d ':' -f 2 | docker login --password-stdin --username "AWS" "https://xxx.dkr.ecr.eu-west-1.amazonaws.com"
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
/ # aws --region "eu-west-1" ecr list-images --registry-id "xxx" --repository-name "xxx/xxx-xxx-xxx"
{
    "imageIds": [
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "1.37.0"
        },
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "1.38.2"
        },
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "1.36.1"
        },
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "1.39.1"
        },
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "1.40.1"
        },
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "latest"
        },
        {
            "imageDigest": "sha256:xxx",
            "imageTag": "1.40.0"
        }
    ]
}
/ # aws --region "eu-west-1" ecr batch-get-image --registry-id "xxx" --repository-name "xxx/xxx-xxx-xxx" --image-ids "imageTag=latest"
{
    "images": [
        {
            "registryId": "xxx",
            "repositoryName": "xxx/xxx-xxx-xxx",
            "imageId": {
                "imageDigest": "sha256:xxx",
                "imageTag": "latest"
            },
            "imageManifest": "{\n   \"schemaVersion\": 2,\n   \"mediaType\": \"application/vnd.docker.distribution.manifest.v2+json\",\n   \"config\": {\n      \"mediaType\": \"application/vnd.docker.container.image.v1+json\",\n      \"size\": 7228,\n      \"digest\": \"sha256:xxx\"\n   },\n   \"layers\": [\n      {\n         \"mediaType\": \"application/vnd.docker.image.rootfs.diff.tar.gzip\",\n         \"size\": 2757034,\n         \"digest\": \"sha256:xxx\"\n      },\n      {\n         \"mediaType\": \"application/vnd.docker.image.rootfs.diff.tar.gzip\",\n         \"size\": 21852924,\n         \"digest\": \"sha256:xxx\"\n      },\n      {\n         \"mediaType\": \"application/vnd.docker.image.rootfs.diff.tar.gzip\",\n         \"size\": 1406608,\n         \"digest\": \"sha256:xxx\"\n      },\n      {\n         \"mediaType\": \"application/vnd.docker.image.rootfs.diff.tar.gzip\",\n         \"size\": 283,\n         \"digest\": \"sha256:xxx\"\n      },\n      {\n         \"mediaType\": \"application/vnd.docker.image.rootfs.diff.tar.gzip\",\n         \"size\": 9840114,\n         \"digest\": \"sha256:xxx\"\n      },\n      {\n         \"mediaType\": \"application/vnd.docker.image.rootfs.diff.tar.gzip\",\n         \"size\": 26849,\n         \"digest\": \"sha256:xxx\"\n      }\n   ]\n}"
        }
    ],
    "failures": []
}
spaghettifunk commented 5 years ago

that's interesting error but I have slight feeling that it's a separate issue. I think that for your case the way the AWS object is instantiate it, needs to change. Right now what the component does is to check if there are some credentials or IAM Roles. For cross-account, I think it is necessary to keep the session and config objects of multiple accounts. Honestly I've never had such a situation but after some googling, I came across this blogpost that says the above.

I would suggest to open a new issue about this particular problem and maybe I can spend some time to fix the issue on keel side. Although, I think that there will be some changes in the configuration file (and also in the helm chart) to understand when there are cross account situations. This is just a brain-fart and probably I'm wrong, but at the moment I'm not sure how to solve the problem quickly.

What do you think?

jcardoso-bv commented 5 years ago

Sadly my understanding of the Go AWS SDK is limited at best but I think I get the gist of the issue. :) Happy enough to open up a separate issue as requested.