serverless / serverless

⚡ Serverless Framework – Effortlessly build apps that auto-scale, incur zero costs when idle, and require minimal maintenance using AWS Lambda and other managed cloud services.
https://serverless.com
MIT License
46.47k stars 5.72k forks source link

Move package to per-function #1777

Closed benjaminabbitt closed 8 years ago

benjaminabbitt commented 8 years ago
Feature Request:

Move the 'package' definition in the YAML to be per-function.

With 'package' as a top level component in the YAML file, all dependencies for all functions are included in the built package. For example, if I have two functions (A and B). Function A depends on library C, then the deployed package for functions A and B both have all of the code for A, B, and C included.

With this proposal, function A would consist of a package with A and C, and function B would be only package B.

Leave package as a top level element for those who don't care to define it on a per-function level.

Benefits:
flomotlik commented 8 years ago

Yup agreed, there was a longer discussion in #1486 on this. I'm not sure how fast we'll get around to implementing this, but I think its a pretty good place for a contribution from someone. I've added the help-wanted label so people know to pick it up. Shouldn't be too hard I assume.

johncmckim commented 8 years ago

@flomotlik I would be happy to help with this. It looks similar to the last one I looked at.

Just so I understand it. There would be a top level default package configuration plus a per function configuration. Would it also make sense to move the top level package config to the default property if this goes ahead?

benjaminabbitt commented 8 years ago

Conceptually, I would like a package to be optionally defined on a per-function basis. I was unaware of a "default" key in the YAML, but it conceptually makes sense to put it there if we are defining a default key.

benjaminabbitt commented 8 years ago

Not to add too much effort, but it may be valuable to take a look at the event.json as well. If it is (as I understand it) intended to provided sample data, then it needs to be per-function, unless it's intended to be an object keyed on function names or something. I haven't seen any documentation any way.

flomotlik commented 8 years ago

@benjaminabbitt you can use the --path option to point to an event.json file. Our docs aren't that good here: https://github.com/serverless/serverless/tree/master/lib/plugins/invoke

@johncmckim Yes it should be a package config parameter at the top and then package in each function. We want to remove the default property at some point, so we'd rather have package directly at the top. Let me know if you have any further questions, happy to help or jump into a quick call

johncmckim commented 8 years ago

@flomotlik thanks. I think I've got a good idea of the issue. I'll make a start and if I get stuck I'll send a WIP PR for a review.

benjaminabbitt commented 8 years ago

@flomotlik thank you, missed that the event.json could be specified.

flomotlik commented 8 years ago

@johncmckim great, I set the issue to in-progress. Thanks for that.

johncmckim commented 8 years ago

Just adding some thoughts around the implementation.

The first step will be changing the deployment process to create an function specific zip file if a function package config is present. As the package plugin is common to providers, this will affect the aws provider and any other providers in development.

For the aws provider, the uploadDeploymentPackage step will have to be changed to upload multiple packages and the cleanup will need to account for there being multiple packages per version.

This could lead to smaller lambda function code. However, it will likely lead to larger (and slower) deployments overall.

johncmckim commented 8 years ago

I am also wondering about how beneficial this would be for nodejs. I can't really comment on python.

Imagine this example.

a.js  -- requires foo.js & baz
b.js -- requires bar.js && qux
lib
  |__ foo.js
  |__ bar.js
node_modules
  |__ baz
  |__ qux
  |__ baz-dependency-a
  |__ baz-dependency-b
  |__ qux-dependency-a
  |__ qux-dependency-b
service: test-sls-proj
provider:
  name: aws
  runtime: nodejs4.3
functions:
  a:
    handler: handler.a
    package:
      exclude:
        - lib
      include:
        - lib/foo.js
  b:
    handler: handler.a
    package:
      exclude:
        - lib
      include:
        - lib/qux.js

Realistically I don't see developers excluding node_modules through config. It is likely that there will be too many dependencies and peer dependencies to go through them all and re-include them. If you did explicitly include specific dependencies + peer dependencies you run the risk of missing one and breaking your package.

If that's the assumption, how much smaller would this actually make your function specific package? To get any real benefit you would need to use some kind of optimizer.

Note, I'm not saying that this issue shouldn't be implemented just bringing up points to consider.

benjaminabbitt commented 8 years ago

I'm viewing it from a different organizational perspective.

service: test-sls-proj
provider:
  name: aws
  runtime: nodejs4.3
functions:
  a:
    handler: a/index.handler
  b:
    handler: b/index.handler
/serverless
/serverless/serverless.yaml
/serverless/a/handler.js
/serverless/a/package.json
/serverless/a/node_modules/...
/serverless/b/handler.js
/serverless/b/package.json
/serverless/b/node_modules/...

With this approach, and node_modules installed on a per-function basis, I think that per-function packages make sense.

pmuens commented 8 years ago

Hey guys, thank you for jumping into this discussion and sharing your thoughts! 👍

For the aws provider, the uploadDeploymentPackage step will have to be changed to upload multiple packages and the cleanup will need to account for there being multiple packages per version.

We had this pattern before and switched to the service zipping by design. The problem with zipping functions is that you need a way to include other, external code as well the function might access. This was previously handled through our "magic handler" concept which works fine for Node.js but is way more complex for Java as a runtime. A single function deployment was introduced with the deployFunction plugin recently where we only push the function code (but still the whole zipped service) to AWS through the SDK rather than CloudFormation.

We need to look into this in more detail as a per function zipping will introduce lots of changes. Maybe @flomotlik can chime in here.

Here's a related issue where we've discussed the packaging process in detail (and came to the "per service" zipping as a solution) https://github.com/serverless/serverless/issues/1486

flomotlik commented 8 years ago

Basically I like the approach of simply making package available in every function as well. From then on users are responsible themselves how to set up their service, what to include or exclude.

I fully agree there with @benjaminabbitt that teams are responsible for structuring their repo in this way and we should add this to the docs. Then its also easy for them to have shared resources in a lib folder in the same level as your serverless.yml and functions that need to have specific stuff get their own folder.

This is definitely an advanced functionality and should only be used on functions that you actually need to make faster. And only after inspecting your functions and really making sure you know why and how to optimise.

My assumption is still that too many people think this optimisation is necessary and makes a huge difference in their infrastructure, when in reality it doesn't and isn't worth the effort on their end.

Also by implementing a simple version (just packaging per function the same way it is now implemented service wide) we can potentially learn more about the different pain points people have.

johncmckim commented 8 years ago

I've made a start on this, had a few hours on a plane. I've just started by refactoring the zipService module. I've splitting the zip and package logic so the zip logic can be re-used.

I'm on holidays this weekend so I'll send a WIP PR next week.

nicka commented 8 years ago

@johncmckim Nice work on this! 🎉

Another cool thing we could potentially achieve from this is formatting a hash based on what's in package.include. This way the archive/version name could stay the same when no files where changed and we skip updating those Lambda's during deployment.

flomotlik commented 8 years ago

@nicka yup, though I fear there are lots of side effects of this we currently don't know yet when we only deploy some functions. I could see at some point adding this as an optional feature and after very very thorough production testing by adventurous users it might become standard.

We would also have to hash the CF template (e.g. what if resources are changing) and all the files that are part of the one package (or basically we'd have to hash the package while setting the creation/modification time of files to a common time, otherwise the Hash will be different). As somebody who build CI/CD systems in the past intelligence that only runs part of the deployment sometimes was always very scary.

johncmckim commented 8 years ago

I have been thinking about the need for a different folder structure. Specifically grouping artefacts and templates into folders on S3. However, it looks like #1898 does this. If so that solves the cleanup issues related deploying individual functions.

flomotlik commented 8 years ago

@johncmckim yup @pmuens implemented a folder structure in #1906 and we'll merge this pretty soon

johncmckim commented 8 years ago

@flomotlik @pmuens awesome that will save me some work. I'll wait until that is merged before continuing with this. I want ensure the new packaging functionality works with those changes.

As the PR is now sls deploy packages either as a whole or individually depending on the flag individually. It also unions the service and function level include and exclude options.

service: test-sls-proj
package:
  individually: true # or false
  exclude:  # exclude lib and test.js for all
    - lib
    - test.js

functions:
  a:
    handler: a.hello
    package:
      include:
        - lib    # re-include lib
      exclude:
        - b.js  # add another exclude
  b:
    handler: b.hello
    package:
      exclude:
        - a.js  # add another exclude

files

|_ lib
  |_ foo.js
|_ a.js
|_ b.js
|_ test.js

test-sls-proj-dev-a.zip

|_ lib
  |_ foo.js
|_ a.js

test-sls-proj-dev-b.zip

|_ b.js

I've used the individually flag for two reasons. The first reason is to allow this functionality to be used by only those who want it. If it turns out to be the best way to package services it could be dropped later. The second reason is so the Azure and OpenWhisk implementations can throw an error if they do not support it.

I have unioned the service and function level include and exclude options as I think it is the DRY'st approach. There could be common items you want to exclude from all i.e. tests and you can always re-include specific items later. An alternative would be to use the function level exclude and include options only.

Let me know what you think.

nfour commented 8 years ago

Currently building a plugin to handle builds in a similar way to the package plugin, but to optimize for nodejs projects. Once this issue gets merged in I should be able to produce artifacts for each function individually, which should do the trick for deployment.

On that note, any thoughts on the exclude/include supporting glob/regex? I've got that implemented currently and find it more flexible and natural.

flomotlik commented 8 years ago

1917 implements this. Thanks @johncmckim for getting this done