Feature Request: Allow warming up >1 function instances

vladholubiev commented 7 years ago

Thanks for the plugin! Currently the only limitation which stops me from using it instead of custom solution is that it warms up only 1 function, but I want to have ~ 10 warm functions w/ VPC connections, so it can handle a sudden burst.

What do you think? Is this a unique use case which doesn't belong to this plugin or you can consider this feature?

goncaloneves commented 7 years ago

Hi @vladgolubev.

Yes it is something that I could work on and add to the plugin if there is actually a good way to do this.

Have you looked on how this would work?

goncaloneves commented 7 years ago

What I have in my mind is blocking the lambda callback for let's say 5/10 seconds and invoking n times.

vladholubiev commented 7 years ago

@goncaloneves

If I understand correctly, I'd do smth like wrapping lines 236-L241 with

_.times(this.warmup.warmersCount, () => invokes.push(lambda.invoke(params)/* ... */)

goncaloneves commented 7 years ago

Invoking concurrently does not guarantee new container initialisation. Most probably will reuse just one container if the lambda does early exit.

Unless that container is not immediately available.

So my thinking is can we hold lambda warmup callback for enough time that the concurrent invokes spin new ones? In Nodejs we would need to hold the event loop.

der-flo commented 7 years ago

I'm also interested in this topic. When Amazon scales up the lambda, it immediately routes requests to the new ones?

The coldstart problem could easily be fixed if we could signal the readyness for requests inside the booted lambda and Amazon waits with requests-routing until this signal. I did not read of such a feature in the AWS docs so I assume the route requests quite earlier to the new container, but not until the container (not the app / node) is up?

goncaloneves commented 7 years ago

@der-flo I don't think so.

Let's say you have no lambda container initialised, if you invoke 5 times concurrently, you will probably get 5 new containers initialised. But after those have been initialised if you invoke again 10 minutes later, you will probably only hit one of the containers. And you will only hit a second one with high volume concurrency that has a higher price tag. All the others ones would be dropped unless you can keep that high volume coming in.

So this begs my question, what if we hold the container callback response to the warmup once it's initialised, could this make AWS spawn new containers? I have no idea and I have started writing some code to test this logic, but I still didn't have time to go through the testing.

If this is feasible then we could have a way of making this work, if not we need more ideas.

juanjoDiaz commented 7 years ago

A bit late to the party but I'm curious to know if got anywhere with this?

AFAIK AWS doesn't give any visibility on lambda concurrency and no guarantees of how they scale. So it seems almost impossible to me to add this functionality in a solid and predictable way.

I kind of refuse to believe that if you make 5 request to a non-instantiated lambda you'll get 5 lambdas. Amazon can definitely do better than that.

Regarding your proposal of holding the response in the event loop. I don't think that that will have any effect since, regardless on when the callback is executed, most probably the response is sent and the connection closed asap.

goncaloneves commented 7 years ago

I still didn't do this test yet.

I found these guys offering warmup to multiple lambda containers as a service. https://lambdacult.com/spark/faq

Fetching CW logs for tracking usage and using lambda versions with corresponding alias to switch during warmups.

ryan-roemer commented 6 years ago

I have this exact same use case, and I was thinking that we could delay the completion of the lambda if the warmup task is detected. A somewhat reverse of the guidance in the README 😛

module.exports.lambdaToWarm = function(event, context, callback) {
  /** Delayed response for WarmUP plugin for concurrent warmups... */
  if (event.source === 'serverless-plugin-warmup') {
    setTimeout(function () {
      console.log('WarmUP - Lambda is warm!');
      return callback(null, 'Lambda is warm!');
    }, 1 * 1000); // wait 1 secs to force concurrent executions
  }

  // ... add lambda logic after
};

(I'm picking 1 secs, because I'm using VPC and we're seeing cold starts in the 10-14 seconds latency range, so that should definitely "hit it". Could probably pick more or less).

The problem with this hack, however, is that all the real traffic might spin up a lot more concurrent Lambdas during that explicit delay period. Then on the periodic refresh, we probably hit the same problem again...

But, I think it would potentially guarantee multiple concurrent executions (with some side effects / costs).

Thoughts? (Happy to PR this up if desired...)

juanjoDiaz commented 6 years ago

Hi @ryan-roemer,

That's a good theoretical idea. Except for the fact that you might be ~10 lambdas overprovisioned most of the time and the cost might increase a bit.

However, how do you know if AWS decides to spin up a new lambda or if it simply waits for your lambda to finish? As I mentioned above, Lambda concurrency is a mystery... Even, the tool that was posted above (https://lambdacult.com/spark/faq) to keep up multiple lambdas has shut down...

The only way of achieving something like this (iirc the way that spark used to do it) would be to hammer the API gateway and listen to the logs until you see the number different Lambda IDs that you want. Not a very nice solution, require the extra permission to read the logs, and looking at what happened to spark I assume that it doesn't quite work. 🙂

ryan-roemer commented 6 years ago

We can probably run some experiments in the CloudWatch dashboard / Lambda dashboards to just observe a constant concurrency-numbered warmer function and just see if it works out in practice...

I've got some decent introspection hooked up to some projects so could run some experiments if we wanted over like a day or so inspecting the logs. (Side note -- I'm weeks away from parental leave and will likely drop off the internet soon after -- so if I somehow don't pick up this thread until later, that's what happened)

goncaloneves commented 6 years ago

@ryan-roemer any insight in your experiments of holding execution?

ryan-roemer commented 6 years ago

I haven't had time to fork this project to spin up multiple lambdas, but using a tweaked version of thsi plugin (had to manually copy / change the code script itself because we use package.artifact and a separate build script) I'm seeing lambdas stay warm on a 10 minute interval (a little more permissive than default 5).

I'm about to head out on baby leave, so I won't be able to PR a POC, but if someone else puts up a PR, I could pull it down and try it out in my current setup and report back!

goncaloneves commented 6 years ago

Sure @ryan-roemer, I will revisit this soon I hope. Thanks for putting the time. Enjoy your leave.

aldarund commented 6 years ago

there some solution for warming up multiple lambdas, but i didnt tested it yet but it might be good to see how they do it https://medium.com/thundra/dealing-with-cold-starts-in-aws-lambda-a5e3aa8f532 https://github.com/thundra-io/thundra-lambda-nodejs-warmup

marcfielding1 commented 6 years ago

I can work on this, I already have PoC code that runs of CloudWatch metrics although I also have an alternative idea which we can look at on PR. I'll have a look today tomorrow. :-) Could you assign this to me?

marcfielding1 commented 6 years ago

Container re-use isn't hard to monitor, for example in our code base I could do the following in my handler:

Excuse the pseudo code I'm on my mobile.

// inside my handler
  someLoggingFunction(isNewContainer)
  const isNewContainer = false

// loggingFunction

function someLogginFunction(isNewContainer) {
 // if isNewContainer is undefined then we log it as a new container with a timestamp
// else log it as a re-used container

}

Now the magic happens when your functions run, you check how many new containers your app used in the previous minute and multiply your requests by that.

This is a very simple version of what I have, you can get complicated by figuring out if it really was a new container by looking at the execution time per function and then figuring out if any overlapped.

Granted this does perhaps involve Dynamo or "another" method of storage, but with a simple switch you could have single warmpup or analytical warm up where it does this stuff, trust me when I say if you get this right it'll be hugely popular especially in financial services.

marcfielding1 commented 6 years ago

@goncaloneves @rschweizer I'm about to fork/duplicate this repo as I REALLY need the analytical multi warmup - obviously I'd rather contribute here but this was a couple of months ago so I'm concerned about it being maintained?

If you'd like I can help maintain this project :-) LinkedIn for me is here

juanjoDiaz commented 6 years ago

Hi @marcfielding1 , Is there anything stopping you for creating a PR for your change?

marcfielding1 commented 6 years ago

@juanjoDiaz Well yes and no - so I'd like to discuss the solution since it's a pretty big feature and up until now there hadn't been any responses to any issues for nearly 2 months :-) so I wasn't sure if it would be worth it.

juanjoDiaz commented 6 years ago

I think that the idea of having a state variable storing whether the container is new or not is reasonable. But I definitely wouldn't want to have Dynamo or any other external state.

I get a bit confused with Now the magic happens when your functions run, you check how many new containers your app used in the previous minute and multiply your requests by that.. Can you elaborate on that?

Knowing that a container is new or not doesn't really help to ensure the number of parallel containers that you run. You might have only 2 new container but 2345653456 containers running already. Maybe we could document that, in order to keep more than one lambda warm, the lambda needs to respond with { containerID: <SOME_UNIQUE_ID> } when called from the warmer. The user can use uuid or whatnot to generate such id. Then modify the warmer function to send N number of parallel request and try few times increasing that number until it gets back more than N different container IDs.

A problem with this is that, if the lambda responds too fast, it might be able to handle all the parallel request with fewer lambdas that you would expect. But at the same time, blocking the lambda a bit so if force a new lambda to warm up is making your API unresponsive every time that warmup kicks in.

Wdyt?

marcfielding1 commented 6 years ago

@juanjoDiaz Sorry for the delay here, I've been caught up at work, I'm just gonna test out some stuff today and will respond asap!

ejwilburn commented 5 years ago

Just added a PR for this: https://github.com/FidelLimited/serverless-plugin-warmup/pull/81

juanjoDiaz commented 5 years ago

Closed by #81

marcfielding1 commented 5 years ago

@goncaloneves Heya, so I'm literally just deploying this today sorry for the delay in testing we got blocked on a card payments/checkout issue! Will post back later/tomorrow morning. I might have an additional PR that I can put up for evaluation, I'm just checking out the cloudwatch metrics docs - basically, it LOOKS like you can say "give me average executions for a specific time period" - ie over the last week how busy is my API at X time of day.

marcfielding1 commented 5 years ago

@goncaloneves So the tests for this are going really well for this, seems to work very nicely in as much as our dashboard in elastic has dropped the number of cold starts - we have around 100 testers hitting the API's at the moment via our iOS and Android apps - so for this feature thumbs up!

The reason I say dropped is we tend to get bursts when/where special offers appear in our app is down to the person making the offer so if it's something really popular we get a few cold starts where everyone logs in at once.

Currently, I think we would end up grabbing the warming function(because we're lazy like that) moving it to our src directory and then having it take an SNS source so we can tell it how many containers to warm up for that edge case above.

So it could be an idea to be able to specify a path to the warming function rather than just having the stock one.

Also intelligent warm-up would be "nice" it's not really a biggie as can tune our cron schedule but if I can figure out cloudwatch metrics/log analysis then that be very cool to say "Hey it's 4am, we only want 1 container warm".

Anywho I'll add the "eject" issue and then continue with the metrics thingy on #34?

goncaloneves commented 5 years ago

Great news Marc 🙌

Thanks for pushing tests through.

Yes please continue in #34.

We can release master after I go through tests, then create PRs to work on intelligent warmup and external warmup.

Curious, why SNS and not Lambda invoke?

marcfielding1 commented 5 years ago

@goncaloneves Hrmm, I suppose we could - so the way it works a retailer can just suddenly pick some items and say "Hey these xbox games/beers/whatever are half price" so when our system applies that offer the idea was we fire off that SNS event - for now it's just gonna prod the warmup function, in the future there would likely be multi subscribers so, for instance, a Lambda that would post to social media etc etc.

SNS was just an example though, what I was aiming at is that the in-built warming function likely won't suffice for more complex architectures.

juanjoDiaz commented 5 years ago

That sounds good actually and it's easily achievable by simply adding custom triggers to the warmup function.

marcfielding1 commented 5 years ago

Yeah I mean what you have is great, so all I'm really saying is in our experiments this cropped up as a case we might need to get a bit hacky which is where you tend to think "feature" :-)

goncaloneves commented 5 years ago

Absolutely. We can certainly add option to add other event sources.

Filling feature gaps, simple ones are usually the most missed.

I will be back online Jan 6th to push on my side.

Cheers Marc and Juanjo.

Happy new year!

ryan-roemer commented 5 years ago

Apologies if this is the wrong place and I'm not completely understanding things -- for https://github.com/FidelLimited/serverless-plugin-warmup/pull/81/files if we follow the guide and have say:

5 concurrent warmers
With a 25ms delay each

... and then I have actually really hot traffic, isn't this going to bottleneck the Lambda during that 25 ms while real requests / invocations then need to cold start additional Lambdas?

(and thanks for the continued attention and work from everyone on this issue!!!)

ejwilburn commented 5 years ago

Yes,, basically. However, if you're using a VPC with the lambda you can set the schedule to 15 minutes so you're only causing that issue for a very short period of time once every 15 minutes.

You could also test shorter warmup delays. I chose 25ms as a rather arbitrary number I thought would be long enough to work but short enough not to lock things too long and it worked in testing. I didn't try shorter delays so it's possible 5 or 10ms would work just as well.

PnutBUTTrWolf commented 5 years ago

It sounds like most individuals that are attempting to warm up several concurrent lambdas are fighting with serverless technology instead of leveraging it where it makes sense. There is NO guarantee that you will warm x number of lambdas without a delay in which case you guarantee cold starts during the delay invocation. If your application can't handle an occasional cold start then maybe look to use a container. Lambda isn't a silver bullet.

juanjoDiaz commented 5 years ago

Hi @micahbhodge,

Yes and no.

It sounds like most individuals that are attempting to warm up several concurrent lambdas are fighting with serverless technology instead of leveraging it where it makes sense.

People are working around the limitations of a propietary technology using what we know. This approach has been publicly advertised by the AWS team behind lambda. Calling it "fighting" is a bit of a stretch.

There is NO guarantee that you will warm x number of lambdas without a delay in which case you guarantee cold starts during the delay invocation.

There is not indeeded. Unless you build a more "intelligent" solution that uses the cloudwatch logs. However, our experiments show that a short wait like 25ms are enough to warmup multiple containers at once. So it doesn't increase delays much. In any case, this plugin simply does a best attempt based on the information that we have. AWS can of course change things any time.

If your application can't handle an occasional cold start then maybe look to use a container. Lambda isn't a silver bullet.

There are many reasons to NOT use serverless like if you need persistent connections, long-running tasks, high-performance computing, etc. But there is absolutely nothing wrong with avoiding cold starts by keeping lambdas warm regardless of the actual usage, which is the point of this plugin.

PnutBUTTrWolf commented 5 years ago

Hi Juan,

Thanks for your reply. I do not mean to say that one should never use warmups on lambda. My team uses lambda and warmups on each of our lambdas. My point is that if you are attempting to warm up a specific number of concurrent lambdas, that behavior will follow a law of diminishing returns. You are guaranteed to occasionally hit a cold start during the warmup windows which is why i said that if you can't handle an occasional cold start that lambda might not be the right choice. Anyone pretending that they will eliminate them completely is kidding themselves.

juanjoDiaz / serverless-plugin-warmup

Feature Request: Allow warming up >1 function instances #24