spring-cloud / spring-cloud-function

Apache License 2.0
1.03k stars 614 forks source link

Support Servlet API and Spring MVC applications/microservices #107

Closed ceefour closed 7 years ago

ceefour commented 7 years ago

I'd like to propose support for Servlet API and Spring MVC. Specifically, in order to do this there are incremental enhancements needed, each one builds upon previous:

  1. A standard way and best practices to deploy arbitrary event processing (SCF already does this)

    • For example, since startup time is more important in cloud functions than regular Spring Boot, there could be a guide that mention best practices on how to reduce startup time
      • Tooling (e.g. spring-cloud-function-maven-plugin) to trade runtime time vs. build time. For example, classpath scanning can be moved to build time, so at runtime there's much less work to do. This may require support in other Spring components.
      • A nice side effect is these optimizations can be used outside of SCF, in regular Spring projects
    • Guide on how to optimize the "warm" container state, i.e. reusing Hibernate connections/connection pool
    • Guide for restrictions and gotchas that need to be followed (i.e. 50 MB limit, threading, shared stuff, filesystem, etc.)
    • Guide to optimize/preload classpath scanning
  2. SCF to serve/wrap Servlet applications. For example for AWS, there is aws-serverless-java-container-spring made by @SAPessi of AWSLabs

  3. Extend support for serving Spring MVC controllers.

    • This is already possible as proof-of-concept (thanks @joeyvmason!): Developing Serverless Applications with Spring MVC and AWS Lambda and serverless-spring-boilerplate, however very far from optimized i.e. 20+ second startup time (which is normal for some Spring Boot applications but is bad for lambda functions, the default timeout is 6 seconds although it can be increased to 60 seconds).
    • This potentially allows Spring to run many Java web frameworks not on top of servlet container anymore, but on top of serverless. See Zappa for successful example in Python world, that can run microservice frameworks like Flask, Bottle, Pyramid, but also full-stack Django apps, mostly without any code change. Please see Zappa's presentation slides to get a feel why this is important and such a game changer.
      • Of course the "other" web framework will need to ensure behave well in a serverless environment, but this will be much easier if Spring Boot components already laid the foundation to enable this support smoothly.
    • I also did a proof-of-concept that this can work: https://github.com/ceefour/wicket8serverless

      It's using Kotlin + Spring Boot + Web 1.5.6 and @martin-g's Wicket 8.0.0-M7 + Wicket Bootstrap 2.0.0-M6. Not slimmed versions of these frameworks, but the entire Spring Boot loader, Spring MVC dispatcherServlet and Wicket+Wicket Bootstrap installed as servlet Filter. It's definitely buggy and lots of gotchas, but initial work is in https://github.com/ceefour/aws-serverless-java-container/tree/spring-boot and hopefully @SAPessi will accept my pull request. :) The package is 24 MB but it's possible to reduce it more by configuring shade to minimizeJar, however my initial minimizeJar efforts resulted in CNF exceptions so I disabled minimizeJar.

      Deployed result is here: https://im9ntgimlc.execute-api.us-east-1.amazonaws.com/prod/

      Of course as expected, the startup time leaves so much to be desired: (there's not even Hibernate yet)

      image

      27 whole seconds just for startup. However, after it's warm, subsequent requests only take 300 ms which is still slow in serverless world, but can be optimized.

  4. Tooling to help local development. For example @Serverless annotation, if given, will enable checks during local development to detect restricted features in serverless environment and give WARN or even ERROR log or throw exception.

  5. Helpful extras to help Spring app developers modify Spring Boot/MVC apps to SCF's programming model. To illustrate the kind of extras possible, allow me to quote Zappa's feature list:

    • Automatic support for all AWS event sources, allowing you to build robust "hybrid" applications
    • Automatic support for hundreds of pre-compiled and pre-optimized C-extentions packages (SciPy, etc.)
    • Support for applications which use multiple cookies simultaneously (i.e., anything with a login)
    • Free, auto-renewing SSL certificates from Let's Encrypt
    • CI integration
    • Automatic support for "global" (multi-region) applications
    • HTTP logging in the Common Log Format
    • Automatic "keep-warm"
    • Intelligent application and variable caching
    • Zero-effort CORS
    • Automatic and intelligent project initializion (zappa init)
    • Multiple environment variable sources (local, S3)
    • Multiple "secure endpoint" authentication methods (API Key, IAM, custom authorizers)
    • Content-Type aliases
    • Highly customizable configurations
    • Managed IAM credientials with the option to supply custom credentials
    • Package optimization
    • VPC-awareness
    • Ability to deploy HTTP-less, event-driven applications
    • Custom error-reporting (ex, to Sentry or Raygun)
    • Remote command invocation (include raw Python)

In the past, Java EE (and still to some extent Spring Boot) has been regarded to some people as hard to develop, hard to deploy, hard to maintain (DevOps, scaling, etc.), and expensive to operate (requiring servers etc.) Spring Boot is making it easy to develop. And Kotlin and upcoming Java 9 also makes the entire Spring Boot experience pleasant.

If SCF can support existing Spring Boot apps with minimal change, that means app developers gain benefits such as easy deployment, automatic scalability, zero maintenance, stability, less vendor lock-in (i.e. Spring Boot app can be moved from AWS Lambda to OpenWhisk and vice versa; and can still be deployed on PaaS like Pivotal Web Services; and regular instances). This will make SCF much more attractive and increase adoption.

I sincerely hope that this enhancement proposal can be considered. You guys rock. Thank you! :)

This is originally from https://github.com/spring-projects/spring-boot/issues/10136, I hope it's okay to repost it here (with some adjustments).

dsyer commented 7 years ago

I think there's a danger here that HTTP does not map to serverless functions very well (e.g. AWS Lambda). If we try to support applications with arbitrary HTTP semantics users will be disappointed because a function event simply does not have those semantics, even if it might have been originally an API Gateway trigger in Lambda, for instance. How would you map the URL path? Content negotiation? Request parameters? Add that to the fact that, as you have noticed, the full web stacks are expensive and probably needlessly so, since they are essentially redundant in a serverless environment, and I think we should conclude that it is better to expand the model we have in SCFn than try and force fit HTTP onto serverless. SCFn event-driven, but stackless (so extremely cheap), which maps well to serverless and permits extremely small footprints and quick startup times. If there are non-HTTP features you think are missing, please feel free to open individual issues.

ceefour commented 7 years ago

How would you map the URL path? Content negotiation? Request parameters?

@dsyer All of these and more are available from API Gateway proxy event. I can understand that without the proxy event it'll be difficult (or even impossible?), but when you realize that this proxy event is a standardized pattern in the serverless provider then you'll realize the potential of benefits (both to Spring itself and to Spring users).

the full web stacks are expensive and probably needlessly so

I don't think Servlet API is expensive. What I assumed by 20+ seconds startup time is optimizable things like classpath scanning, which is done regardless if it's Servlet API or not. https://github.com/awslabs/aws-serverless-java-container/tree/master/aws-serverless-java-container-spring and https://github.com/bbilger/jrestless has proven that Servlet API is not the issue when it comes to startup time. https://www.ccampo.me/java/spring/aws/2016/11/27/spring-aws-lambda.html#comment-3432412561 describes he managed to reduce startup time by swapping Spring's IoC with Dagger. I don't think Spring is "slow", it's just that Dagger provides compile-time optimization and that Spring should consider having that option also in the future (not just for serverless purpose but for any other purpose, some people really want the extra time). Spring innovates, right? You don't have to wait until the next version of CDI spec comes up.

In fact startup time for Spring Boot-like app in serverless environment is faster than Spring Boot with embedded Tomcat, since it does not have to run Tomcat, right? :)

There are other subtle things like session ID generation, which takes time on Tomcat (https://stackoverflow.com/questions/28201794/slow-startup-on-tomcat-7-0-57-because-of-securerandom) but should be faster on serverless because we can just take the request ID which is given "for free" by the serverless provider.

arbitrary HTTP semantics users will be disappointed because a function event simply does not have those semantics

I beg to disagree. A proxy function event does have those semantics. It's very well supported by AWS Lambda, well documented, and is very easily configured from the web UI: http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-set-up-simple-proxy.html

In fact AWS thinks this use case is so common that it's not only possible to do, but very easily used just by checking a checkbox that also happens to be the first form input:

image

(BTW, even by not ticking the checkbox, the lambda event is still proxied from HTTP using default mapping. However by ticking it, it means the entire API gateway deployment is directed to the lambda function)

I have mentioned Zappa above but I feel I need to re-mention it again. https://www.zappa.io/ does for Python WSGI what this enhancement request does for Java Servlet.

However bringing Servlet to serverless is just an initial step and while it already gives benefits, the real benefits come from unique advantages of serverless environment. Request ID I mentioned above is one. Another one is that since we're no longer using Tomcat, which means there's no worry of configuring/maintaining/patching Tomcat. Things like SSL are still a challenge in Spring Boot/Tomcat proper, but in serverless environment this is at worst handsfree and at best configurable and free (https://aws.amazon.com/certificate-manager/). No more worrying about log files because CloudWatch Logs centralized logging is enabled by default (again, nothing to configure).

Another benefit is Lambda Authorizer, e.g. using http://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-integrate-with-cognito.html . Since this effectively offload the heaviest part of auth processing, a serverless Spring Servlet app will be more performant than its full-blown Spring Security twin. (I still love Spring Security though)

A lot more benefits of wrapping WSGI/Servlet as serverless function is available at https://htmlpreview.github.io/?https://raw.githubusercontent.com/Miserlou/Talks/master/serverless-sf/big.quickstart.html (please, it's very good). Although it talks about Zappa (and Python) all of the benefits mentioned there can be realized by Spring (or even better) if Spring decides to do so. Which is what I hope.

Although obviously I'm biased to this Servlet-serverless request (pardon the pun), I believe the reasons I outlined above are objective and can be verified independently. :)

dsyer commented 7 years ago

Thanks for analysis, and for taking the time to think about all this. There are a lot of assumptions is what you say, some implicit, some not, and it's hard to respond to all of it in as much detail as it probably deserves. Example: classpath scanning is not slow; adding features is. The aim of SCFn is not to provide a servlet API wrapper around serverless. It is to keep the programming model as neutral and framework free as possible, so that we can cut away the unnecessary features as cleanly as possible in a serverless environment (e.g. like the security example you gave).

API gateway is only one of many sources of serverless events, and we don't want to pollute the core programming with assumptions about the incoming data and mapping it to HTTP. There might have to be special cases where we allow the user to drop the facade and consume events in a more platform-specific way, but we don't want that to be the core programming model.

If you like the servlet APIs and want to write serverless apps that way with Spring I guess you can use the AWS labs project that you linked to. We are more than happy to co-exist with that, as it's really a different approach to the one we are taking (and it is AWS specific).

ceefour commented 7 years ago

classpath scanning is not slow; adding features is.

Thanks for correcting me. Indeed that is only my assumption. However I can't see how that is not "slow". If a project has 100 classes, only 5 is annotated. Doesn't that mean Spring is wasting time scanning 95 classes? Comparing to an explicit configuration of the 5 beans, I assume the static one is faster. Forgive me if my assumption is wrong, but I don't understand how the runtime one can be as fast as the static one. And if it's not slow then I don't get why Dagger goes the trouble of compile time IoC instead of using runtime classpath scanning just like Spring.

Anyway my underlying argument is that while Servlet does have overhead (though I don't have definite profile), it's not as significant relative to other components combined. If an app starts in 20000 ms and 19000 ms is proven to be spent only initializing the Servlet part, then I'm wrong. However I highly doubt that is the case. I believe initializing Servlet should take as little as 25 ms and even at worst 3000 ms would still be acceptable.

adding features is slow

I agree with that and we have no conflict here. If Hibernate reading tables metadata is slow it's not Spring's fault. And is not serverless Servlet's fault either! it'll still be slow either way when "barebones" SCF is used.

API gateway is only one of many sources of serverless events, and we don't want to pollute the core

True, I never expected this to be in SCF core, I mentioned above that this, if accepted, can be a subcomponent of SCF.

you can use the AWS labs project that you linked to. We are more than happy to co-exist with that, as it's really a different approach to the one we are taking

I respect your decision to reject. The AWS labs project is not yet stable, that's why I try to expose the serverless Servlet approach to Spring and suggest Spring to consider this. I previously suggested this to Spring Boot and got rejected and was suggested SCF instead, but well...

Indeed the approach is different. The approach I'm suggesting does not use the SCF programming model, it uses the Spring Web programming model. Still relevant to SCF because of SCF adapters for different providers. Instead of function-centric, this is serverless-centric. Instead of coding a function, you code to Servlet and expect to run in serverless environment. Therefore things like SCF REST is not relevant in this approach because if you want to do REST, then you write Spring REST controller just like usual, the difference is @RestController now runs on serverless.

(and it is AWS specific). This is factually incorrect. aws-serverless-container is AWS-specific because (1) developed by awslabs obviously and (2) it uses aws sdk. That's implementation. However the serverless Servlet approach can be implemented in any serverless provider the same way SCF already have adapters.

Again I encourage you to take a look at Zappa's talk as a proven example of this approach. Zappa does use AWS by default, but that's not a technical restriction, it's just the most popular choice the same way Tomcat is default in Spring Boot.

Zappa's very fast growth and success with corporations in medical and banking , to me indicates its approach is valid and desirable for some userbase, and this userbase is only getting bigger as serverless technology matures. Zappa can achieve that only because it supports WSGI. In fact without WSGI support, Zappa becomes more like Serverless.com or Apex but Python only. Zappa thrives because its users already have Django/Flask apps that they can "port" to serverless with little effort.

You're certain that this userbase, the people and companies who want to run Servlet apps on serverless, are not important for Spring. Of course I could very much be wrong. Time will tell... I will update this issue if I notice any of your competitors considering this approach. (A few years ago people will laugh if someone said JBoss can be faster than Tomcat, but now it's fair game... yeah Undertow is not the old JBoss but hey times change.)

I still hope you'll reconsider. :)

markfisher commented 7 years ago

I agree with Dave that this feature does not belong in Spring Cloud Function. Note that the first goal listed in the README is: "Promote the implementation of business logic via functions."

That doesn't necessarily mean it doesn't belong somewhere... either AWS labs or potentially spring-aws: https://github.com/spring-cloud/spring-cloud-aws

I think this is a good example of how the concepts of Serverless and FaaS have been conflated within AWS Lambda (and therefore in most people's minds). Serverless is more general, and a full blown Servlet app with multiple mappings is not a Function.

ceefour commented 7 years ago

@markfisher I'd be happy if it gets accepted by Spring Cloud AWS.

However saying it is AWS specific is like saying HTTP/2 is a Jetty-specific protocol. Which is kinda true in the past but with Tomcat 8.5 and upcoming 9, etc. that is no longer the case.

Servlet app with multiple mappings is not a Function

As my own implementation above (and also Zappa's) there is only one mapping, and a function that takes HTTP request as input and returns HTTP response as output. This behavior can be simulated in less featured serverless providers by using multiple mappings.

While currently choice is limited of serverless provider which support proxy style HTTP trigger and Java platform, I'd be very surprised if other providers don't follow AWS's lead. When RDS came cloud MySQL was AWS-specific. But now managed MySQL is available anywhere. Same for PostgreSQL. Aurora was MySQL specific so I was (pleasantly) surprised when Aurora PostgreSQL was announced.

I could be wrong and perhaps 5 years from now non-AWS FaaS providers only support JavaScript and primitive HTTP mapping. But considering fierce competition among them, I'd be very confused if indeed that's what will happen. Again that's like saying Tomcat will never support HTTP/2, which perhaps Jetty people will rejoice if that were true...

I appreciate reconsideration, although of course I respect your decision to reject. :) Think of it, I'm already moving a Spring Boot app to Zappa/Django just because this is not well supported in the Java/Servlet, and I'm also suggesting my circle to consider Zappa for certain kinds of app. Actually not to Zappa, but to a serverless webapp approach, which unfortunately Zappa is the lone and mature contender at the moment. Competition is good, and I love Java...errr...Kotlin. Now as the below evidence suggests I'm not the only person who thinks this way, even this "early adopter stage" of serverless. As serverless technology matures in the coming years, there will be more and more of these people...

An unscientific estimate is stars, http://github.com/Miserlou/zappa currently nears 5K stars, I'm sure SCF will catch up to that number and beyond but Zappa is Python which is far smaller than Java userbase. If its stars growth rate continues, it's possible Zappa's stars will surpass spring-boot in 2 years...

I'm not praising Zappa (of course I think it's a great product), but my intention is to highlight this approach's potential, that it's beneficial for you and your users. Plus DZone people are gonna love writing about it. To Spring and beyond .... ! :-)

markfisher commented 7 years ago

Thanks again for providing so much detailed information. It's quite interesting to see the variety of ways developers are approaching "serverless" in the broad sense (i.e. beyond just "functions"). That said, it doesn't change what both Dave and I have expressed above as the goals of the Spring Cloud Function project, and I'm more concerned about the clarity and consistency of the project than the number of stars it gets.

As you said: "Instead of function-centric, this is serverless-centric. Instead of coding a function, you code to Servlet and expect to run in serverless environment."

But Spring Cloud Function is function-centric. Coding a function is what it's all about. We do not want to pollute that model within the scope of the project.

If you think there's enough interest in a portable (non-AWS-specific) Servlet based model for Serverless providers, then you should consider creating a new serverless-servlet project (the name itself does sound paradoxical). The examples you've shown thus far do build upon AWS-specifics. So, in the meantime, I'd still recommend contributing to the AWS labs project if possible, else bring up this topic within spring-cloud-aws. If there's a home for it in the Spring ecosystem, that might be it.

My personal view is that the Servlet model was specifically designed for a different type of runtime. FaaS on the other hand can be viewed (as it is by Spring Cloud Function) as a form of Inversion of Control that promotes a well-defined development model with a narrow scope that in turn facilitates the Single Responsibility Principle. Simulating a model with a much wider scope within that narrow scope seems counter-productive.

I do think we'll see many of the ideas around "serverless", such as scaling resources to 0 when idle (typically motivated by the billing model), being applied more broadly than FaaS as the trend continues.

I'm going to close this issue, but I look forward to seeing where these ideas might lead.

ceefour commented 7 years ago

@markfisher It's okay I understand and respect the decision. Opened another last resort at https://github.com/spring-cloud/spring-cloud-aws/issues/260 ^_^

vab2048 commented 5 years ago

It is unfortunate the spring-cloud-function deemed this as an issue for spring-cloud-aws and spring-cloud-aws has deemed this as an issue for spring-cloud-function (see https://github.com/spring-cloud/spring-cloud-function/issues/107).

Any update on progress made or directions taken?

dsyer commented 5 years ago

There is no update. Spring Cloud Function supports a function-based programming model, and if you want access to the underlying HTTP concepts (headers etc.) they are available. That seems like a good place to be to me.