Fetching SSM params: InvalidSignatureException: Signature expired

hamilton-s commented 10 months ago

Describe the bug Since the end of October, we have seen our Lambda functions intermittently fail due to SSM parameters not being fetched. The error we are seeing looks like the following:

InvalidSignatureException: Signature expired: 20231103T171116Z is now earlier than 20231103T171224Z (20231103T171724Z - 5 min.)
 at throwDefaultError (/var/runtime/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:8:22)
 at /var/runtime/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:18:39
 at de_GetParametersCommandError (/var/runtime/node_modules/@aws-sdk/client-ssm/dist-cjs/protocols/Aws_json1_1.js:4194:20)
 at process.processTicksAndRejections ...

We have managed to bring our errors down by disabling prefetch and reducing the cacheExpiry - however, we would prefer to keep these options as they were. We're also not sure why the issue has suddenly started happening as the issues didn't seem to coincide with a particular upgrade/code change.

To Reproduce

export default middy(handler)
  .use(
    ssm({
      fetchData: {
        paramA: `some/path/a`,
        paramB: `some/path/b`,
      },
      cacheExpiry: 600000,
      setToContext: true,
    })
  )

We notice it only fails in around 1-2% of cases in our production environments.

Environments

Node.js: 18
"@middy/ssm": "4.6.5"
"@middy/core": "4.6.5"
"@aws-sdk/client-lambda": "3.425.0"

Additional context We've noticed this across a range of our services, with different versions of @middy/ssm

willfarrell commented 10 months ago

Thanks for reporting. As you mentioned "We're also not sure why the issue has suddenly started happening as the issues didn't seem to coincide with a particular upgrade/code change.", which makes me think there was a change on the AWS infra side, but I don't see anything in the docs that jumps out as changed. I wonder if there was an SDK change (or one of it's deps)?

If you reached out the AWS support, can you shared their response here.

I'll do some digging as well.

hamilton-s commented 10 months ago

Thanks for your reply!

We've discovered a significant problem in our Observability lambda layer that seems to slow down boot-up times due to instrumenting the aws-sdk. Initially, we thought this was a middy issue because it only occurred with middy, but we've now determined that it's related to aws-sdk being in the dependency tree. We're addressing this with our Observability partner and hope it resolves our problem. If others are also facing this issue, it could remain open, but for now, we're focusing on fixing our lambda layer to see if it resolves the problem. We can close this unless others are experiencing the same issue.

willfarrell commented 10 months ago

For what you described, that sounds like it could easily cause this issue. I'll close for now. If you you need to reopen, please do so. Other are welcome to comment if they're also running into this.

HumbleBeck commented 10 months ago

Hello, I've hit the same error on Lambda@Edge recently. Setup looks like

export default middy(handler)
  .use(doNotWaitForEmptyEventLoop())
  .use(ssm({
    fetchData: {
      paramA: "path"
    },
    setToContext: true,
    awsClientOptions: {
      region: process.env.REGION || 'us-east-1'
    }
  }))

Environment

Node.js: 18
"@middy/ssm": "4.6.5"
"@middy/core": "4.6.5"

INFO    InvalidSignatureException: Signature expired: 20231119T162823Z is now earlier than 20231119T163124Z (20231119T163624Z - 5 min.)
    at throwDefaultError (/var/runtime/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:8:22)
    at /var/runtime/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:18:39
    at de_GetParametersByPathCommandError (/var/runtime/node_modules/@aws-sdk/client-ssm/dist-cjs/protocols/Aws_json1_1.js:4242:20)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /var/runtime/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24
    at async /var/runtime/node_modules/@aws-sdk/middleware-signing/dist-cjs/awsAuthMiddleware.js:14:20
    at async /var/runtime/node_modules/@aws-sdk/middleware-retry/dist-cjs/retryMiddleware.js:27:46
    at async /var/runtime/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:7:26
    at async Promise.allSettled (index 0)
    at async to (/var/task/src/edgeGate/handler.js:63:43001) {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: 400,
    requestId: '55d1f065-3954-4531-9ffe-af7e2e2a8a29',
    extendedRequestId: undefined,
    cfId: undefined,
    attempts: 1,
    totalRetryDelay: 0
  },
  __type: 'InvalidSignatureException'
}

It happens randomly to a small number of requests. I couldn't figure out a pattern yet.

hamilton-s commented 9 months ago

We managed to fix our observability lambda layer issue but we are still experiencing the middy issue. @willfarrell Can we please reopen this issue since others appear to have the same issue in the thread above too?

HumbleBeck commented 9 months ago

My theory on what is happening in our case:

here we cache a client in memory https://github.com/middyjs/middy/blob/3dea624830f7f9fb2673657f113a1c72f1c5d28f/packages/ssm/index.js#L164
AWS signature valid for 15 minutes, so if you have traffic and your lambda keeps running without cold starts you reuse your client
eventually, the signature expires and your client starts throwing InvalidSignatureException
what's interesting is that this line https://github.com/middyjs/middy/blob/3dea624830f7f9fb2673657f113a1c72f1c5d28f/packages/ssm/index.js#L125 sets undefined in the cache, and while it throwing an error all future invocations receiving undefined, which breaks completely everything

willfarrell commented 9 months ago

After some digging, I think I have a theory.

Middy is setup with an expiry
first Request is made, ssm is fetched, it gets cached, timer is set to refresh the cache
While the lambda has been idle for a while, the timer expires triggering the refresh. A fetch promise is create but stopped somehow by AWS (the theory part)
next request comes in >5min later, the fetch promise w/ a now expire signature fails.

How to fix, All middlewares that fetch from aws services will need to catch InvalidSignatureException and force a retry during the request. I'll have to think on how to best implement this.

Would love to hear if the above steps makes sense those running into this issue.

Ref: https://repost.aws/knowledge-center/lambda-sdk-signature

willfarrell commented 9 months ago

I pushed a PR, if someone could test irl that would be great.

HumbleBeck commented 9 months ago

I'll try to copy-paste your changes, we are still on v4, and run it for a few days.

willfarrell commented 9 months ago

@HumbleBeck Any feedback on this?

HumbleBeck commented 8 months ago

Hi @willfarrell. While this bug rarely happens to us, I can confirm that the fix works, and it started recovering expired signature calls.

willfarrell commented 8 months ago

Awesome, I'll update the PR to cover all AWS service middleware (just in case) and merge in. Thanks a lot for testing it out.

pranav-chefman commented 4 months ago

Hi, If you are still on version 4. One workaround is overriding the retry strategy and passing it to middy.

const middy = require('@middy/core');
const ssm = require('@middy/ssm');
const {ConfiguredRetryStrategy} = require('@smithy/util-retry');

class ClockSkewRetryStrategy extends ConfiguredRetryStrategy {
  constructor(maxAttempts, computeNextBackoffDelay) {
    super(maxAttempts, computeNextBackoffDelay);
  }

  isRetryableError(errorType) {
    return errorType === 'CLIENT_ERROR' || super.isRetryableError(errorType);
  }
}

...

middy()
      .use(
        ssm({
          ...
          awsClientOptions: {
            retryStrategy: new ClockSkewRetryStrategy(3, 500),
          },
          ...
        })
      )
      .before(async (request) => {
        ...
      });

middyjs / middy

Fetching SSM params: InvalidSignatureException: Signature expired #1123