Plugin Can Cause Error: "A version for this Lambda function exists ( $x ). Modify the function to create a new version"

jthomerson commented 6 years ago

Sometimes when deploying a stack that uses this plugin, I will get the error:

A version for this Lambda function exists ( $x ). Modify the function to create a new version

That error is caused by the AWS::Lambda::Version resource created by Serverless having a new logical resource ID, but the same value for the code SHA checksum. Serverless creates the logical resource ID by hashing the zip file (containing the code), and the Properties object from the Lambda function, with everything except for the Code object.

Background:

In our stack, we always include the hash of the latest commit as a stack tag and an environment variable. We can then use that in our logging so that our log messages indicate which version of the code logged the message. This error happens when:

I deploy the service using one commit
I don't do anything to change the function's code or configuration (maybe I change something else in the service - some other resource, or I chance something in an entirely different service, or I just amend / reword the commit)
I commit something new, rebase, or amend the commit that I previously deployed - anything to cause the latest hash to change
Then I deploy the service again

Cause:

The cause is an order-of-operations problem that can be resolved by this plugin. Here's what happens now:

Serverless creates the hash of the code zip and the properties of the Lambda function
- At this point, the properties contain the latest-commit-hash in an environment variable
Later, the plugin removes all environment variables
Then the stack template gets deployed to CloudFormation, and CloudFormation (or Lambda) eventually throws the error stated at the outset of this issue.

At step one, the hash used in the logical resource ID changes because the latest-commit-hash changed in the environment variables - the function's properties changed. But then the plugin strips the environment variables, which means that really the properties of the function are exactly the same as they were in the previous deploy. That's why Lambda complains - the code zip and the function properties are the same, but there's a new resource being made for a new function version (it's a new resource because the logical ID changed in the stack - the old function version will eventually get deleted from the stack in the cleanup phase, although because Serverless marks the function versions as "Retain", the version itself will never get deleted from Lambda).

Fix:

The fix is to make this plugin strip the environment variables from the function properties before the hash is done. I think using the before:package:compileFunctions hook would be the appropriate place to do this.

jthomerson commented 6 years ago

Unfortunately, fixing this won't be as easy as I thought ... the package:compileFunctions hook is where the function AND the function version resources both get made, and the hash (that becomes the logical resource ID) is computed.

Ideally, there would've been a hook that we could listen to for after the function is part of the CloudFormation template, but before the function version is part of the template so that we could've moved our env-var-stripping code to run between those two steps.

Some options:

we could get Serverless framework to add a hook or some extension point between those two steps
we could remove all environment variables (from the function definition in the parsed serverless.yml, and in the provider part of the parsed object). a. But that would mean that all functions in the stack lose their environment variables, which is not desirable if you have some functions using Lambda@Edge and others not
we could replicate the logic that's in Serverless to re-make the function version (re-hashing the file and properties) after we've stripped the env vars. a. Yuck, that's a lot of code to replicate, and very likely to cause other subtle bugs
we could force a change on every deploy - adding some comment to the code before it's zipped or something a. Yuck, just yuck. That would mean a lot of unnecessary deploys, which is especially awful since these are Lambda@Edge functions and the CloudFront distribution would then get re-deployed, and that takes longer than (just about) any other AWS resource.
❓ ❔ ❓ ❔ ❓ ❔ ❓ ❔

DASPRiD commented 5 years ago

Is there a workaround for this at the moment? This is currently blocking me from deploying.

jthomerson commented 5 years ago

@DASPRiD we worked around it by not including any environment variables in our template if they change between deploys. In our case, we were including the commit hash, which changed with every deploy, as described above. I'd recommend you look at what environment variables you're setting in your template and make sure that none of them change between deploys. That's the only scenario that we are currently aware of that causes this issue.

Environment variables can't be used by Lambda@Edge anyway, so it's not worth setting them in a template that has only @Edge functions.

If you don't think that's your scenario, please provide an MVCE for us. In this case, that'd be a serverless.yml file and a simple Lambda function and the steps that you use to reliably reproduce the issue.

DASPRiD commented 5 years ago

My serverless template contains both edge and normal functions. Since we are in development stage right now, we keep adding or changing environment variables for new functions. If I understand it correctly, I could force a new function version by adding a comment to the edge lambda functions?

jthomerson commented 5 years ago

Yes, you're correct. The problem is caused by the code not changing, but the env vars changing. The "whys" of that are explained above.

My (tangential) suggestion would be not to create one stack that has both types of functions. Personally, I treat the "edge" configuration (CloudFront distro, DNS, Lambda@Edge) as a separate service from the "backend" ... your "normal" functions, databases, API gateways, etc.

DASPRiD commented 5 years ago

I guess that makes sense, I'll look into refactoring it into two different stacks then :)

jthomerson commented 5 years ago

FWIW, more tangentially: I always recommend the finest-grain stacks you can make, for a number of reasons. But, especially when there are persistent data stores (DynamoDB tables, S3 buckets, etc) or CloudFront distros in the stack, I recommend making an entirely separate stack just for those. That just makes them a lot more stable because they change less frequently. For persistent data stores, less change is good - less likely to hit a weird error (we used to have weird CloudFormation-caused errors with DynamoDB tables, so this is likely a little bit because of battle scars). For CloudFront, it's just because it's such an onerous pain in the neck if a CloudFront distro deploy is needed - it takes 30-45 minutes. So, by keeping those stacks as stable as possible, you avoid those. Since typically most of your iteration will come in the Lambda functions that use or back those resources, I keep those Lambda functions - the APIs / backend step function workflows / whatever - in a stack of their own that's safer/faster to deploy twenty times a day.

DASPRiD commented 5 years ago

Well, for now I guess I'll separate out the edge functions with their direct dependencies out (S3 Bucket, CloudFront distribution), and keep the API stack with its Postgres database. But thanks for the tip, this definitely makes life a little easier :+1:

AnthonyWC commented 5 years ago

Just another data point but this bug also existed on 1.40.0.

This old ticket was closed but was never resolved: https://github.com/serverless/serverless/issues/4631

It also manifested in other situation such as: https://github.com/serverless/serverless/issues/6349

wired00 commented 4 years ago

We also have the issue on setting version hash against a tag.

For a quick win, what seems to work is just using versionFunctions: false

provider:
  versionFunctions: false

Has anyone been using that if version is not a requirement?

iDVB commented 3 years ago

Looks like I'm having this same issue. However, in our case it was another plugin that was adding tags that we did not know of.

serverless-plugin-git-variables apparently adds tags unless you disable it.

Their Docs:

Moreover, it adds GIT related environment variables and tags (GIT_COMMIT_SHORT, GIT_COMMIT_LONG, GIT_BRANCH, GIT_IS_DIRTY, GIT_REPOSITORY, GIT_TAGS) for each defined function in the serverless file. You can disable this by adding the following custom variable in your serverless.yml file:

custom:
  exportGitVariables: false

Added that and all is now well!

silvermine / serverless-plugin-cloudfront-lambda-edge