Closed benjaminabbitt closed 8 years ago
Yup agreed, there was a longer discussion in #1486 on this. I'm not sure how fast we'll get around to implementing this, but I think its a pretty good place for a contribution from someone. I've added the help-wanted label so people know to pick it up. Shouldn't be too hard I assume.
@flomotlik I would be happy to help with this. It looks similar to the last one I looked at.
Just so I understand it. There would be a top level default package configuration plus a per function configuration. Would it also make sense to move the top level package config to the default
property if this goes ahead?
Conceptually, I would like a package to be optionally defined on a per-function basis. I was unaware of a "default" key in the YAML, but it conceptually makes sense to put it there if we are defining a default key.
Not to add too much effort, but it may be valuable to take a look at the event.json as well. If it is (as I understand it) intended to provided sample data, then it needs to be per-function, unless it's intended to be an object keyed on function names or something. I haven't seen any documentation any way.
@benjaminabbitt you can use the --path
option to point to an event.json file. Our docs aren't that good here: https://github.com/serverless/serverless/tree/master/lib/plugins/invoke
@johncmckim Yes it should be a package config parameter at the top and then package in each function. We want to remove the default property at some point, so we'd rather have package directly at the top. Let me know if you have any further questions, happy to help or jump into a quick call
@flomotlik thanks. I think I've got a good idea of the issue. I'll make a start and if I get stuck I'll send a WIP PR for a review.
@flomotlik thank you, missed that the event.json could be specified.
@johncmckim great, I set the issue to in-progress. Thanks for that.
Just adding some thoughts around the implementation.
The first step will be changing the deployment process to create an function specific zip file if a function package config is present. As the package plugin is common to providers, this will affect the aws
provider and any other providers
in development.
For the aws
provider, the uploadDeploymentPackage
step will have to be changed to upload multiple packages and the cleanup will need to account for there being multiple packages per version.
This could lead to smaller lambda function code. However, it will likely lead to larger (and slower) deployments overall.
I am also wondering about how beneficial this would be for nodejs. I can't really comment on python.
Imagine this example.
a.js -- requires foo.js & baz
b.js -- requires bar.js && qux
lib
|__ foo.js
|__ bar.js
node_modules
|__ baz
|__ qux
|__ baz-dependency-a
|__ baz-dependency-b
|__ qux-dependency-a
|__ qux-dependency-b
service: test-sls-proj
provider:
name: aws
runtime: nodejs4.3
functions:
a:
handler: handler.a
package:
exclude:
- lib
include:
- lib/foo.js
b:
handler: handler.a
package:
exclude:
- lib
include:
- lib/qux.js
Realistically I don't see developers excluding node_modules
through config. It is likely that there will be too many dependencies and peer dependencies to go through them all and re-include them. If you did explicitly include specific dependencies + peer dependencies you run the risk of missing one and breaking your package.
If that's the assumption, how much smaller would this actually make your function specific package? To get any real benefit you would need to use some kind of optimizer
.
Note, I'm not saying that this issue shouldn't be implemented just bringing up points to consider.
I'm viewing it from a different organizational perspective.
service: test-sls-proj
provider:
name: aws
runtime: nodejs4.3
functions:
a:
handler: a/index.handler
b:
handler: b/index.handler
/serverless
/serverless/serverless.yaml
/serverless/a/handler.js
/serverless/a/package.json
/serverless/a/node_modules/...
/serverless/b/handler.js
/serverless/b/package.json
/serverless/b/node_modules/...
With this approach, and node_modules installed on a per-function basis, I think that per-function packages make sense.
Hey guys, thank you for jumping into this discussion and sharing your thoughts! 👍
For the aws provider, the uploadDeploymentPackage step will have to be changed to upload multiple packages and the cleanup will need to account for there being multiple packages per version.
We had this pattern before and switched to the service zipping by design. The problem with zipping functions is that you need a way to include other, external code as well the function might access. This was previously handled through our "magic handler" concept which works fine for Node.js but is way more complex for Java as a runtime.
A single function deployment was introduced with the deployFunction
plugin recently where we only push the function code (but still the whole zipped service) to AWS through the SDK rather than CloudFormation.
We need to look into this in more detail as a per function zipping will introduce lots of changes. Maybe @flomotlik can chime in here.
Here's a related issue where we've discussed the packaging process in detail (and came to the "per service" zipping as a solution) https://github.com/serverless/serverless/issues/1486
Basically I like the approach of simply making package available in every function as well. From then on users are responsible themselves how to set up their service, what to include or exclude.
I fully agree there with @benjaminabbitt that teams are responsible for structuring their repo in this way and we should add this to the docs. Then its also easy for them to have shared resources in a lib folder in the same level as your serverless.yml and functions that need to have specific stuff get their own folder.
This is definitely an advanced functionality and should only be used on functions that you actually need to make faster. And only after inspecting your functions and really making sure you know why and how to optimise.
My assumption is still that too many people think this optimisation is necessary and makes a huge difference in their infrastructure, when in reality it doesn't and isn't worth the effort on their end.
Also by implementing a simple version (just packaging per function the same way it is now implemented service wide) we can potentially learn more about the different pain points people have.
I've made a start on this, had a few hours on a plane. I've just started by refactoring the zipService
module. I've splitting the zip
and package
logic so the zip logic can be re-used.
I'm on holidays this weekend so I'll send a WIP PR next week.
@johncmckim Nice work on this! 🎉
Another cool thing we could potentially achieve from this is formatting a hash
based on what's in package.include
. This way the archive/version name could stay the same when no files where changed and we skip updating those Lambda's during deployment.
@nicka yup, though I fear there are lots of side effects of this we currently don't know yet when we only deploy some functions. I could see at some point adding this as an optional feature and after very very thorough production testing by adventurous users it might become standard.
We would also have to hash the CF template (e.g. what if resources are changing) and all the files that are part of the one package (or basically we'd have to hash the package while setting the creation/modification time of files to a common time, otherwise the Hash will be different). As somebody who build CI/CD systems in the past intelligence that only runs part of the deployment sometimes was always very scary.
I have been thinking about the need for a different folder structure. Specifically grouping artefacts and templates into folders on S3. However, it looks like #1898 does this. If so that solves the cleanup issues related deploying individual functions.
@johncmckim yup @pmuens implemented a folder structure in #1906 and we'll merge this pretty soon
@flomotlik @pmuens awesome that will save me some work. I'll wait until that is merged before continuing with this. I want ensure the new packaging functionality works with those changes.
As the PR is now sls deploy
packages either as a whole or individually depending on the flag individually
. It also unions the service
and function
level include
and exclude
options.
service: test-sls-proj
package:
individually: true # or false
exclude: # exclude lib and test.js for all
- lib
- test.js
functions:
a:
handler: a.hello
package:
include:
- lib # re-include lib
exclude:
- b.js # add another exclude
b:
handler: b.hello
package:
exclude:
- a.js # add another exclude
files
|_ lib
|_ foo.js
|_ a.js
|_ b.js
|_ test.js
test-sls-proj-dev-a.zip
|_ lib
|_ foo.js
|_ a.js
test-sls-proj-dev-b.zip
|_ b.js
I've used the individually
flag for two reasons. The first reason is to allow this functionality to be used by only those who want it. If it turns out to be the best way to package services it could be dropped later. The second reason is so the Azure and OpenWhisk implementations can throw an error if they do not support it.
I have unioned the service
and function
level include
and exclude
options as I think it is the DRY
'st approach. There could be common items you want to exclude from all i.e. tests and you can always re-include specific items later. An alternative would be to use the function
level exclude
and include
options only.
Let me know what you think.
Currently building a plugin to handle builds in a similar way to the package
plugin, but to optimize for nodejs projects. Once this issue gets merged in I should be able to produce artifacts for each function individually, which should do the trick for deployment.
On that note, any thoughts on the exclude/include supporting glob/regex? I've got that implemented currently and find it more flexible and natural.
Feature Request:
Move the 'package' definition in the YAML to be per-function.
With 'package' as a top level component in the YAML file, all dependencies for all functions are included in the built package. For example, if I have two functions (A and B). Function A depends on library C, then the deployed package for functions A and B both have all of the code for A, B, and C included.
With this proposal, function A would consist of a package with A and C, and function B would be only package B.
Leave package as a top level element for those who don't care to define it on a per-function level.
Benefits:
Drawbacks: