Serverless/FaaS integration - what will it look like?

kingdonb commented 6 years ago

What will it look like and how do we prioritize it

(Does it need to come before some other important features that we're proposing, does it really need a change to Workflow in order to support it, or can this be a documentation-only issue where we collect and show all of the novel ways how our serverless runtimes are integrated with our Workflow apps?)

One thing we discussed in #41 is that you might want to use a FaaS to support CI / test runners and the result of the CI should be able to decide whether to promote an environment in pipeline style from one stage to another. We are pretty sure this can be accomplished without anything new, except that there is no formal concept of environments (#62) and it's not currently possible to promote a build (#63) without rebuilding, unless you use Docker image deployment.

But in terms of integrating a serverless function, if you have those two things, you can get the rest of this done with a Deploy Hook. It would be good to have a document that explains how.

I'm sure there are even more interesting things we could do with serverless runtimes, and changes to Workflow to permit integrating them more deeply are not off the table at this point, I just don't know what it will look like. (My imagination is very poor.)

Just establishing this issue as a place to talk about it. What kind of pub/sub things are we going to do with our serverless runtimes and Workflow?

krancour commented 6 years ago

I'd suggest not hitching this issue to things like CI, environments, promotions, etc. This is its own discrete problem.

Start by asking how a "function" differs from a normal application.

I found this bit from Wikipedia useful:

Serverless computing architectures in which the customer has no direct need to manage resources can also be achieved using Platform as a service PaaS services. These services are, however, typically very different in their implementation architecture, which has some implications for scaling. In most PaaS systems, the system continually runs at least one server process and, even with auto scaling, a number of longer running processes are simply added or removed on the same machine. This means that scalability is a more visible problem to the developer.[5]

In a FaaS system, the functions are expected to start within milliseconds in order to allow handling of individual requests. In a PaaS systems, by contrast, there is typically an application thread which keeps running for a long period of time and handles multiple requests. This difference is primarily visible in the pricing, where FaaS services charge per execution time of the function whilst PaaS services charge per running time of the thread in which the server application is running.

Note that auto-scaling (including scaling to 0 when under 0 load) is a huge part of this. The Kubernetes scheduler hasn't solved for that yet. Scheduling is too low-level a function to tackle in Workflow, imo, but... a number of projects that I think you're aware of are already attempting to build FaaS on top of Kubernetes. Perhaps integration with one of those is a workable solution? Although dependencies of such magnitude wouldn't be adopted without serious consideration.

But, again, focus on how a function differs from an application. If you were (hypothetically) to let something like Kubeless or OpenFaaS do the heavy lifting, the difference between an application and a function, from Workflow's perspective, may simply be deployment mechanism. i.e. A application is deployed using a k8s Deployment, but perhaps a function is deployed via a CRD provided by one of those other APIs.

kingdonb commented 6 years ago

That's interesting. I think I saw a demo of Riff ("Riff is for functions") that showed functions scaling down to zero, well this past week or two I've been looking at serverless.com and Kubeless, and I noticed but didn't really take note of the fact that kubeless doesn't do this. You're always running one function container once you've deployed a function, even if there's no traffic at the gateway.

You can configure autoscaling but at the baseline, that Ruby function that goes out and makes an HTTP client request is going to cost you a memory footprint of 20MB until you delete it or manually scale it to 0 (rendering it non-functional, ha ha.)

krancour commented 6 years ago

Ya. Scaling to zero is important because of this bit:

This difference is primarily visible in the pricing, where FaaS services charge per execution time of the function whilst PaaS services charge per running time of the thread in which the server application is running.

Now, granted, (let's take workflow out of this for a moment) if you're running a k8s cluster with n nodes, the cost savings are quite indirect. You're paying for n nodes-worth of resources all the time whether functions are running or not. The "pricing" advantage of functions on such a cluster is "cost" in resources. i.e. Functions that aren't running aren't consuming resources. Those resources are available to functions that are running. This creates the possibility of having fewer nodes in your cluster to begin with.

So scale to zero is super, super important. It's the thing that makes this all compelling.

teamhephy / workflow

Serverless/FaaS integration - what will it look like? #64