serverless / examples

Serverless Examples – A collection of boilerplates and examples of serverless architectures built with the Serverless Framework on AWS Lambda, Microsoft Azure, Google Cloud Functions, and more.
https://www.serverless.com/examples/
Other
11.44k stars 4.47k forks source link

Lambda was unable to decrypt the environment variables because KMS access was denied #279

Open mohitkale opened 6 years ago

mohitkale commented 6 years ago

Dear Author,

For some strange reasons only the GET SINGLE TODO ITEM request is not working while all other APIs are working fine (i.e., LIST, CREATE, UPDATE, and DELETE).

I am getting this error, in the API Gateway console.

Reference Example: https://github.com/serverless/examples/tree/master/aws-node-rest-api-with-dynamodb

Endpoint response body before transformations: {"Message":"Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.","Type":null}

I am using same ITEM ID in both GET and DELETE methods, the DELETE method works but the GET method throws an Internal Server Error (stack trace as mentioned above).

Please suggest.

tremby commented 6 years ago

I came across the same error today in my own project. Like you, it seems only one of my functions is affected, and I'm not sure why.

andarilhoz commented 6 years ago

I'm having the same issue, did someone figure out a workaround?

liampauling commented 6 years ago

I had this issue and after some head banging found out it was due to deleting an IAM policy and creating using the same name, simply changing the IAM of the lambda to something else, saving and then changing back fixed it.

tremby commented 6 years ago

I ran the command to remove everything serverless had deployed, then deployed again, and for some reason it was then OK. 😕

Lasim commented 6 years ago

I had the same issue. It was necessary to remove all lambda functions and they deploy them again.

jaybarts commented 5 years ago

Same issue happened to me today with my own project using sls version 1.32.0. It's an unfortunate workaround, since removing and deploying results in brand new endpoints, which would be a problem for me in production.

dschep commented 5 years ago

I've never seen this. Can this be reproduced reliably? If so, could you provide me with your serverless.yml so I can debug this?

jaybarts commented 5 years ago

I've never seen this. Can this be reproduced reliably? If so, could you provide me with your serverless.yml so I can debug this?

@dschep I was able to reproduce it quite a few times today, but it seemed to take a few tries (of deploys & removes) before I got the same exact error. I created a repo with the serverless.yml as well as instructions on how to reproduce. I think it's related to a serverless deployment failing midway, which in my case was due to a duplicate name for a CloudWatch Event Rule. I'm sure any name conflict error would also cause the issue, but I included this particular case since it did the trick for reproducing the issue.

Link to the Repo: https://github.com/jaybarts/sls-kms-issue

Thank You for offering to take a look at this issue. Please let me know if you need anything else.

dschep commented 5 years ago

Thanks for the dtails @jaybarts!! I'll take a look at this tomorrow or early next week.

PvanHengel commented 5 years ago

I had the same issue today, it is related to when you delete and re-deploy. Ive had some instances where I want to do a clean test of the entire stack.

huangenyang commented 5 years ago

Was developing and deploying fine on one computer. About to travel so setup on a new laptop. Same code but just new serverless setup on a different computer, getting this error and couldn't pass it. The lambda complained was configured using the default encryption. I got back to the other PC I used and tried to deploy the same code, no problem. So I have two computers one I can deploy and the other (possibly running a newer version of serverless and other tools which cannot.

sverraest commented 5 years ago

I ran into this issue just now.

hard-coders commented 5 years ago

I had the same issue and figured out the problem. AWS Doc said,

AWS Lambda authorizes your function to use the default KMS key through a user grant, which it adds when you assign the role to the function. If you delete the role and create a new role with the same name, you need to refresh the role's grant. Refresh the grant by re-assigning the role to the function

So, I just re-deploy function and it worked well.

ctippur commented 5 years ago

Experienced the same issue as well. Had to delete the lambda function manually and recreate using terraform to resolve it.

GCCreemars commented 5 years ago

This happens to me quite frequently, more so as the number of functions in my serverless service grows. Removing and subsequently re-deploying has an almost 50% chance of having this error pop up when I try to test my deployment now.

Hyperadministrator commented 5 years ago

My problem was caused due to the fact:

  1. I changed the user's key which is used on building new instances (the first key which gets placed into the instance to enable SSH-connection) without changing the corresponding KMS key policy in AWS
  2. I also had few orphaned account-IDs in key policy. I read from somewhere these might also cause failures.

When I added my AWS user account ARN to the list of allowed users under policy's decrypt action and removed orphaned user account IDs (orphaned due to the fact we deleted one AWS user, but corresponding user's account ID persisted in policy) then problems disappeared.

Al-Jp commented 5 years ago

Go to the Lamda console > Encryption Configuration > Restart the configuration. For example, change it to a customer master key and save and then again return it to default and save. This solved my problem.

adimoraret commented 4 years ago

I've deployed my lambdas with serverless framework and I got this only for one function, but not for the others. All functions are using the same role. Manually changing role in AWS for the function with this issue, to some other random role, and back to the original role fixed the problem. If it helps the one that was not working was triggered by Http GET, the one that worked was triggered by Http POST

cmardonespino commented 4 years ago

I got the same problem that started when I changed from one custom KMS key for another. So once changed the custom KMS in the lambda, when I tried to update the lambda configuration with the AWS CLI command:

aws lambda update-function-configuration --function-name notifications-status-update-emitter --runtime nodejs10.x --handler handler.handler --timeout 60 --memory-size 256 --environment Variables={ENVIRONMENT=staging}

I got the following output

An error occurred (AccessDeniedException) when calling the UpdateFunctionConfiguration operation: Lambda was unable to configure access to your environment variables because KMS returned Access Denied. Please check your KMS permissions. KMS Exception: AccessDeniedException KMS Message: User: arn:aws:iam::xxxxxx:user/deploy is not authorized to perform: kms:CreateGrant on resource: arn:aws:kms:us-west-2:xxxxxx:key/xxxxxxxxxxxxxxxxxx

And that was pretty weird because I already have granted to the deploy user permissions to update the lambda configuration... I though that is so weird! So after some try a couple of times searching what could be a solution for it, I fixed it with the following:

1) Modifying the encryption configuration to the default encryption

(default) aws/lambda

Screen Shot 2019-12-03 at 13 00

2) And then, execute update the lambda configuration again 3) Later enable again the encryption configuration with my custom KMS key 4) Execute again the update the lambda configuration and it should work again

I think maybe this is a AWS bug?

wafaaSultan commented 4 years ago

I have tested from AWS side, I am able to create the lambda function without any issue f018982d4db3:testlambda wafaas$ sls deploy -s dev Serverless: Packaging service... Serverless: Excluding development dependencies... Serverless: Uploading CloudFormation file to S3... Serverless: Uploading artifacts... Serverless: Uploading service service-name.zip file to S3 (1.84 KB)... Serverless: Validating template... Serverless: Updating Stack... Serverless: Checking Stack update progress... ................................. Serverless: Stack update finished... Service Information service: service-name stage: dev region: eu-west-1 stack: service-name-dev resources: 9 api keys: None endpoints: None functions: lambda1: service-name-dev-lambda1 lambda2: service-name-dev-lambda2 lambda3: service-name-dev-lambda3 layers: None Serverless: Run the "serverless" command to setup monitoring, troubleshooting and testing. f018982d4db3:testlambda wafaas$

looks like the issue from serverless side

here is the sample of my template

functions: lambda1: # Do Not Change This Lambda Name Without Update The manna-serverless-plugin !!! handler: handler.hello description: testing function integration: lambda resultTtlInSeconds: 0 type: request tags: LambdaName: lambda1 environment:
test: testdata ENVIRONMENT: lambda

tjcobb commented 4 years ago

@dschep Is there any update on this? This happens to us fairly consistently when doing a remove followed shortly by a re-deploy. The issue usually resolves itself within 5-10 minutes. Is there anything we can add to our deployment to speed that up?

wafaaSultan commented 4 years ago

I found a workaround to fix this issue by adding a role direct to your template "serverless.yml" with lambda full access as following; functions: lambda1: # Do Not Change This Lambda Name Without Update The manna-serverless-plugin !!! handler: handler.hello description: testing function integration: lambda role : arn:aws:iam::xxxxxxxxxxxx:role/Lambda resultTtlInSeconds: 0 type: request tags: LambdaName: lambda1 environment: test: testdata ENVIRONMENT: lambda

I have tested from myside and it's working

ajoga commented 4 years ago

I had this issue and after some head banging found out it was due to deleting an IAM policy and creating using the same name, simply changing the IAM of the lambda to something else, saving and then changing back fixed it.

I believe this is because the lambda references the identifier of the IAM role to use, not the ARN of the IAM role. Read more about identifiers here : https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-unique-ids

sh4des commented 4 years ago

AWS, multi-billion dollar company, compute in the cloud, remove the need for servers. -- have you tried turning it off and on again?

campellcl commented 4 years ago

Still seeing this issue intermittently. Redeployment does appear to resolve the issue, but is far from an ideal solution. What is worse is that Lambda when invoked, still appears to return an HTTP status code 200, or 202 (depending on a sync or async invocation) which makes it rather hard to detect this error programmatically.

c4lm commented 4 years ago

Immediate redeployment (sls deploy right after sls remove) does not help, I usually wait at least a minute or two. And then it still might not help! At this point I have to scrap and restart every second dev deployment. Even though it irritates me quite a bit during development, to my surprise, our prod deployments have not been affected by it yet, probably because we don't deploy to prod as often as I deploy on dev to test things out.

mogaal commented 4 years ago

I just had the same problem and as people mention here: it is related with redeployment using the same role name.

I did solved it by: IAM -> Roles -> $YourRoleNameHere -> Revoke Sessions -> Revoke active sessions

I hope it helps.

joyofdata commented 4 years ago

Seems to be the same issue:

https://github.com/terraform-providers/terraform-provider-aws/issues/6352

I also just ran into it. Revoking active sessions didn't solve it.

ramgrandhi commented 4 years ago

I just went to Lambda Console -> my lambda -> Environment Variables section -> Edit -> DONT DO ANY CHANGE -> Click on 'Save'. And, it started to work!

Yongshuai-Liu commented 4 years ago

I've deployed my lambdas with serverless framework and I got this only for one function, but not for the others. All functions are using the same role. Manually changing role in AWS for the function with this issue, to some other random role, and back to the original role fixed the problem. If it helps the one that was not working was triggered by Http GET, the one that worked was triggered by Http POST

This solved my problem! Thanks

prashanthtiramareddi commented 4 years ago

We are facing the same issue with one of our application, wonder why it is happening to only one lambda. Anyone recently fixed the issue?

GCCreemars commented 4 years ago

I think this happens to me when I have the AWS Lambda GUI open in a browser tab on one of the Lambdas in the service when I redeploy. The error seems to occur less frequently when closing all open Lambda tabs before redeploying.

marcelomanchester commented 3 years ago

@liampauling your suggestion is still working!!! thanks

gurunathchoukekar9 commented 3 years ago

If you are Deleting IAM Role and recreating again it causes this KMS issue when running Lambda

Resolution: Do not delete IAM Role when redeploying. You can delete all policies under role and recreate all policies

I did the same in my AzureDevOps AWS CLI script to resolve this issue

tomaszdudek7 commented 3 years ago

Quick fix provided by @ramgrandhi above (go to Lambda UI -> edit Lambda config (with no tweaks whatsoever) -> save) solves the issue for me.

Any idea why does it occur and when? I am not able to reproduce it. Duh.

david-mcqueen commented 3 years ago

We had this issue if our Role was unchanged between deployments and did a serverless remove && serverless deploy. We solved it by removing the name from the Role within serverless.yml. With the name omitted Serverless generates a unique name, for each deployment.

ExecutionRole:
    Type: AWS::IAM::Role
    Properties:
        RoleName: my-execution-role-name        // Remove this line
        AssumeRolePolicyDocument:
        ...
parencik commented 3 years ago

I had the same issue. It was necessary to remove all lambda functions and they deploy them again.

Well... after redeployment my function worked well but another one failed with this error...

mikaelvesavuori commented 3 years ago

Seeing the same every now and then. A real heart-breaker, and creates a huge mess when you're working on something that is not really ideal to "rip and replace".

dspenard commented 3 years ago

This is frustrating, I've been working with a CloudFormation deployment all day with one Lambda function, building and tearing down repeatedly to do some testing, and now all of a sudden I get this error message. Redeploying the Lambda solved the issue, which is troubling, but at least I'm back in business. This is far from ideal for a robust CI/CD process, but I'm not dealing with a production-ready system at the moment, so for my situation this solution is fine for now.

createdbykartik commented 3 years ago

Yup, re-deploying fixes the problem. It's that simple.

tomaszdudek7 commented 3 years ago

It may sound simple, but having your CI/CD randomly fail now and then(well, even worse than fail - deploy something that does not work) is awful. And so is telling your teammates "Well, this rock and solid framework can sometimes render your deployment unusable. Just try deploying again when it does!".

I'd love sls team tracking down and fixing this bug.

sambonator1 commented 3 years ago

Having to custom code post-IAC deployment tests to automatically redeploy portions of it to get around this bug really sucks.

sverraest commented 3 years ago

Almost 3 years later...

drexler commented 3 years ago

Took me 3 days to track this issue down!! Perhaps, for a mitigation step, the CF template can be analyzed for renaming changes which cause this issue and then if present perform a redeployment of the APIs. This can be externalized via the Serverless.yml to control when it should be triggered. I'll hash up a draft PR for this when I have a few cycles.

fkunecke commented 3 years ago

I had the same issue and figured out the problem. AWS Doc said,

AWS Lambda authorizes your function to use the default KMS key through a user grant, which it adds when you assign the role to the function. If you delete the role and create a new role with the same name, you need to refresh the role's grant. Refresh the grant by re-assigning the role to the function

So, I just re-deploy function and it worked well.

This worked for me, except I am using AWS Amplify. Thanks!

jweilhammer commented 3 years ago

Was also able to fix this by just changing the execution role of the lambda function in the Configure tab to anything else, and then back to the role it needs. Seems to re-apply the role to the lambda and it runs as expected.

Re-deploying the entire lambda itself also works, but I found this to be an easier and quicker solution :-)

nathant727 commented 3 years ago

We saw these errors recently too: "Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access." I resolved this error ^ by assigning our Lambda function to a different Execution role and then reassigning it to the correct Execution role.

jweilhammer commented 3 years ago

After hitting this again, believe this error is because of the IAM role session time. Think that if the role is changed, and the lambda tries to execute again within a window of its max session time, then this error will occur.

Potentially waiting the duration for the old role's session to expire would fix as well, and explains why switching the role is fixing it (lambda retrieving new session with updated role)

dithos211 commented 3 years ago

We ran into this issue a couple of days back. Our lambdas have been deployed using Terraform and the lambdas are meant to be triggered using event bridge events. But the lambdas were not recognizing the events since event bridge was not added as a trigger to the lambdas. I suspect it might be because the terraform scripts for events were executed before the lambdas were deployed. Once the triggers were set (had to edit the rules and save them manually), got the below error when we tried to test the lambdas.

"Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: AccessDeniedExceptionKMS Message: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access."

Setting the IAM role to a different one, saving, setting it back to the original and saving it again got the lambdas to work.

steven-hunt-devopsgroup commented 2 years ago

Thanks @dithos211 , those steps worked for me perfectly. Ta very