juanjoDiaz / serverless-plugin-warmup

Keep your lambdas warm during winter. ♨
MIT License
1.11k stars 113 forks source link

Help Wanted: Facing cold start issue #98

Closed ankkho closed 5 years ago

ankkho commented 5 years ago

I have implemented this plugin and still facing cold start issue. My event source is API Gateway.

Note: I am using VPC, and by default lambda function within VPC are kept warm for 15 minutes.

Following is my warmup config:

  warmup:
    default: true
    folderName: '_warmup'
    cleanFolder: true
    memorySize: 128
    name: 'keep-function-warm'
    role: warmup-role
    timeout: 20
    prewarm: true
    concurrency: 5

How can I resolve this?

marcfielding1 commented 5 years ago

Hey @ankkho do you have any output in your cloudwatch logs for the warming function?

In actual fact, are you installing this via NPM or pointing at the master branch, looking at it there are PR's pending to fix "master" and npm hasn't had a release but the documentation has auto updated so it's actually documenting features that aren't in yet I think @juanjoDiaz - is that correct?

juanjoDiaz commented 5 years ago

Yes. The concurrency feature hasn't been released yet ( don't have rights to release).

However, the plugin should be working for a single lambda instance.

You could check the CloudWatch logs for errors. I could be that the lambda doesn't have the needed roles. Also, take a look at https://github.com/FidelLimited/serverless-plugin-warmup#gotchas and https://github.com/FidelLimited/serverless-plugin-warmup/issues/13 for more details. Running in a VPC is a bit different. You need a NAT or the function deployed out of the VPC (not implemented yet)

ankkho commented 5 years ago

@juanjoDiaz Issue is not related to NAT Instance or roles. This plugin is keeping lambda warm, but still I had faced cold start issue.

For example: I have a user service (in VPC) along with this plugin installed. And keep-lambda-function is working as expected with above plugin config.

Now, when I call getUserDetails function through API Gateway, after 20 minutes (approx) I faced cold start issue.

@marcfielding1 I have used npm to install this plugin.

@marcfielding1 @juanjoDiaz Wish you guys a Happy New Year!! :)

marcfielding1 commented 5 years ago

Happy new year and owww my head!

So quick question are you logging out both the function execution(keep-lambda-function) and then logging in getUserDetails when it gets warmed up?

I'm just double checking because in my mind I can't think of way you're still getting cold starts if it's actually being warmed up, one of them must be bugging out or not getting executed, could you stick some debug in the function that's supposed to be getting warmed up, a bit like this(if you haven't already):

  if (event.source === 'serverless-plugin-warmup') {
    console.log('WarmUP - Lambda is warm!')
    return callback(null, 'Lambda is warm!')
  }
ankkho commented 5 years ago

So quick question are you logging out both the function execution(keep-lambda-function) and then logging in getUserDetails when it gets warmed up?

I don't call getUserDetails immediately after deployment. But I have previously faced cold start issue when function is called infrequently.

I've used dashbird for monitoring and alerts. Below is the screenshot of resource breakdown from one of my lambda function where I recently faced a cold start.

image

This function took 17 sec!

Also, the above code snippet was already added.

juanjoDiaz commented 5 years ago

If I'm reading your screenshot correctly, it shows 4 calls to the lambda over 1 second and only the last one is getting the cold start.

That would make sense since the concurrency warmup is not released yet. So the version of the plugin that you are using only warms 1 instance of the lambda.

Could that be it?

ankkho commented 5 years ago

First first is an authorizer and last one is getUserDetails, they both belong to same service (user). Second and third belong to different services.

So the version of the plugin that you are using only warms 1 instance of the lambda.

Single instance of each lambda function within a service, right? In that case, my last function should not be facing a cold start.

Cold start occurs in last function getUserDetails which should not be happening since user service is using this plugin to keep all lambda functions warm.

marcfielding1 commented 5 years ago

Are you making an HTTPS call in that function?

I don't mind jumping into your aws account if you'd like to create a read only user, my LinkedIn is here

It's really rather impossible to say without checking out the execution logs, so if you can't grant read access I reckon the execution logs for the warmer and getUserDetails would do it, I'd start by logging out the event headers(as above) just to prove this function is actually getting warmed up :-)

Good find on dashbird by the way, I'm gonna give that a go, currently I've ended up streaming everything to Elasticsearch and creating Kibana dashboards, dashbird gives a nice and easy overview..

ankkho commented 5 years ago

@marcfielding1 I am not making any https call from that function. HTTPS call and cold start are not related. If any https request is taking longer that provided timeout. In that case, a lambda function would return a timeout error.

Also, this function is being warmed up.

Good find on dashbird by the way, I'm gonna give that a go, currently I've ended up streaming everything to Elasticsearch and creating Kibana dashboards, dashbird gives a nice and easy overview..

Don't do all that hardwork, use dashbird. It's pretty good! :)

marcfielding1 commented 5 years ago

httpS calls can affect cold starts 17 seconds is very odd number for a cold start. But if ya not doing one then that can't be it. The reason I ask is I remember reading somewhere that because of way SSL handshakes work making a httpS call from a lambda can cause a cold start because it's CPU bound, I was trying to find the article now.

juanjoDiaz commented 5 years ago

Hi @ankkho,

did you find anything else about this?

ankkho commented 5 years ago

Hi @juanjoDiaz,

I have recently stopped working on my serverless project (will resume later). For now I'll close this issue.