typelevel / feral

Feral cats are homeless, feral functions are serverless
Apache License 2.0
168 stars 41 forks source link

Idea: Simple and scalable mechanism for bundling lambdas into services #274

Open djspiewak opened 1 year ago

djspiewak commented 1 year ago

I wonder how feasible it would be to eventually implement a framework which could take Feral-defined lambdas and bundle them together as a persistent microservice (which presumably would run on the JVM). The observation here is that serverless is usually really great when you're a smaller project with lower traffic and less time for devops BS, but as you scale up it starts to get really expensive and it would be nice to have a clean migration path for taking your existing stuff and moving it onto something like EKS.

I feel like Feral is kind of uniquely positioned to offer something useful in this space, long-term (i.e. probably not right now), because the abstraction level is so much higher and because the runtime semantics are so uniform across Native, JS, and JVM. One of the many challenges with this type of thing is the ideal platform for serverless functions is either JS or Native, while the ideal platform for bundled persistent microservices is JVM. In theory, we can move more fluidly between those spaces.

Anyway, it's just a seed of a thought, but maybe something that could turn into something.

(from https://discord.com/channels/632277896739946517/918373380003082250/1023915447885758484)

Baccata commented 1 year ago

The function to take an http route and make a lambda implementation already exists.

As long as engineers apply some reasonable separation of concern, their logic can be easily deployed to lambda (AWS anyway) or an application alike. Granted, separation of concern is not necessarily something that comes easy to engineers, unfortunately.

So, assuming you're not talking about the software aspect, are you talking about the deployment aspect ?

djspiewak commented 1 year ago

Granted, separation of concern is not necessarily something that comes easy to engineers, unfortunately.

This is kind of where I was thinking. :-) Can we do anything from an API design standpoint which more strongly encourages this? Ideally I'd like to be able to say "well, if you follow these rules, then you can trivially swap between a Lambda deployment and a containerized deployment", and for as many classes of functions as possible.

armanbilge commented 1 year ago

Ideally I'd like to be able to say "well, if you follow these rules, then you can trivially swap between a Lambda deployment and a containerized deployment", and for as many classes of functions as possible.

Aren't Lambda events specific to ... Lambdas? How do you send them to something that is not a Lambda.

djspiewak commented 1 year ago

It's going to be pretty function type-specific if we can do it at all. In the limit we would need to synthesize something that looks like a lambda event but which came from within the runtime, and then that is passed to the wrapped function. Effectively it would be a limited-case alternative Lambda runtime. Again, the motivation here is just to provide an easy migration path for folks who start on serverless and then later discover the cost model doesn't scale (or who want to avoid cloud vendor lock-in).

armanbilge commented 1 year ago

So there is this thing, which is for example what Google is using for their newer serverless offering. (Honestly other things are appealing about their model too, like the fact that you can send multiple events for concurrent processing to a single serverless instance.)

https://cloudevents.io/

Last I checked AWS had specifically not adopted this spec.

Baccata commented 1 year ago

Again, the motivation here is just to provide an easy migration path for folks who start on serverless and then later discover the cost model doesn't scale (or who want to avoid cloud vendor lock-in).

I think it's more of a documentation problem than a tooling problem. From the high level point of view of an http application (be that lambda or EKS), the flow of data goes roughly like :

RawRequest => Http4sRequest => BusinessInput => Business Logic => BusinessOutput => Http4sResponse => RawResponse

where RawRequest and RawResponse are basically Json in the context of serverless.

Then, what users gain from using Feral is that they only have to care about the following bit, which can also be interpreted by Http4s backends (Ember, Blaze, etc)

Http4sRequest => BusinessInput => Business Logic => BusinessOutput => Http4sResponse

Imho, the reality is that RawRequest/RawResponse are so specific to runtimes and deployments that trying to generalise over them would unavoidably lead to a huge maintenance burden accompanied by runtime inefficiencies. So really, all you want to provide to keep your sanity, as a maintainer, whilst keeping things flexible are functions that take care of the following :

RawRequest => Http4sRequest 
Http4sResponse => RawResponse

And that's already the case. In other words, Feral can be considered as yet another runtime for Http4s, akin to Ember, Blaze, etc, and I think that's what it should be advertised/documented as, really (in the context of AWS' ApiGateway and whatever the equivalent is in Google land).

For Feral maintainers to take provide functions for deployment in various infrastructures, it'd take a HUGE amount of effort. Like, businesses are literally built on this very idea. You could probably POC something very simple, but when it comes to deployment, you don't want to encapsulate the near infinite possibilities of the platforms you can deploy to, as

 Small digression/shameless plug

this discussion echoes to what Smithy4s solves, at another level. Smithy4s takes care of this layer :

Http4sRequest => BusinessInput 
BusinessInput => Http4sResponse 

But not just for Http4sRequest, also from CLI arguments, from AWS-specialised Http4sRequests, etc.

djspiewak commented 1 year ago

And that's already the case. In other words, Feral can be considered as yet another runtime for Http4s, akin to Ember, Blaze, etc, and I think that's what it should be advertised/documented as, really (in the context of AWS' ApiGateway and whatever the equivalent is in Google land).

Generally agree, and perhaps for http lambdas, this really is a documentation problem more than an API/framework problem. What about other types of lambdas though? Is http4s just a special case?

Baccata commented 1 year ago

I think all lambdas are special cases, but the http4s case is probably the most valuable in the wild.

If you decompose the data flow into OSI-ish layers :

BusinessInput/BusinessOutput <=> Domain
Http4sRequest/Http4sResponse <=> Application 
RawRequest/RawResponse <=> Transport 

then the general concept of Lambda is essentially the Transport layer, and each type of lambda (think S3, DynamoDB, ApiGateway, Kinesis) is a different application-level layer. So, assuming this model, what's the responsibility of Feral ?

I think it's roughly the following :

  1. Provide the Transport layer for AWS Lambda and Google Serverless (and possibly whatever's in Azure I guess ?), in the form of an interface that asks a Json => F[Unit | Json] function form the developer.
  2. Possibly provide the application-level layers for AWS S3/Kinesis/DynamoDB, in the form of Json encoders/decoder associated to the specialised request/response for each context. (But then, in all frankness, as soon as we get Smithy definitions for them, I'll probably code-generate the corresponding data classes directly from the source of truth)
  3. Possibly provide a "deployLambda" build-tool command for AWS / Google Serverless.

And that's about it.

In the context of a deployed application that runs in ECS, EKS, k8, wherever, there are as many transport layers as they are types of lambdas. Http is a special case in that it's somewhat ubiquitous in the industry and Typelevel is already maintaining several "transport"-layer (once again, we're talking OSI-ish) implementations to run it, but if you take the example of a Kinesis lambda, for instance, what would Feral do for deployed applications ? Assume a tight-coupling with fs2-aws ? What about DynamoDB ? S3 ?

These questions inform my opinion that Feral should focus on the Lambda/Serverless "transport" layers and a few commonly-used "application layers", and inform users of the pathways from moving away from the Lambda/Serverless layers.