pulumi / pulumi-aws

An Amazon Web Services (AWS) Pulumi resource package, providing multi-language access to AWS
Apache License 2.0
447 stars 155 forks source link

Controlling what gets uploaded to Lambda zip file #35

Closed lukehoban closed 6 years ago

lukehoban commented 6 years ago

With #34 we now upload the entire source folder as part of the Lambda zip. Previously, we uploaded the entire node_modules folder.

It is in general necessary to have the full source folder to support arbitrary require('./foo') in code running "on the inside".

However, for many/most Lambdas, they will not use any of the files in the root folder or even the node_modules on the inside. So we are carrying these along even though they won't be used.

Additionally, in local workflows, the working directory may contain source files or sensitive data that should not be uploaded into the runtime environment.

We could solve this with a configuration setting (possibly in Pulumi.yaml) along the lines of https://serverless.com/framework/docs/providers/aws/guide/packaging/.

CyrusNajmabadi commented 6 years ago

Do we think it's ok to include everything by default, and then have user exclude things they don't want uploaded? Or should all inclusion be explicit?

joeduffy commented 6 years ago

This is probably related to https://github.com/pulumi/pulumi-service/issues/122 and https://github.com/pulumi/pulumi-service/issues/22. /cc @ellismg

lukehoban commented 6 years ago

Related to those, but technically different. There’s two phases:

  1. What gets uploaded from dev machine to deployment service
  2. What gets uploaded from deployment service to Lambda/container

This is about the second. It may be we decide to offer one mechanism shared across both, but we may not.

I’d be inclined to upload everything by default, and let folks remove. But we also should look at Serverless and others for inspiration since they’ve gone through this process already.

lukehoban commented 6 years ago

One additional related issue is native modules. When development is done on Mac/Windows, these will not be valid ELF binaries and although they will be uploaded to the Lambda, they will not work.

This is a major known issue with Lambda, and we don't exactly make it worse, but we don't help (and we abstract away that it's Lambda, so it's not as easy for us to make the same excuses as Lambda does).

There are a few options:

  1. If our PPC took only source and ran "npm install" itself - then at least when using a PPC this would work (by us taking on the burden of being the CI)
  2. For local development, instead of directly uploading the folder, we could copy it (without node_modules), and then run npm install inside a Docker container. This would most likely introduce as many or more problems than it solves, but is probably the only way to get this working locally. See https://medium.com/tomincode/serverless-and-npm-native-modules-85cd9a648458 for example.
ericrudder commented 6 years ago

i think, sadly, we are looking at option 2 for the short term ... sort of a semi-known pitfall ... however there are q/a's on the interweb for how to make the best of this.

i do wonder if we should start to archive (or tag) things for blog posts or faqs or qas or something ... there's a bunch of times when i read the "real world knowledge" you are generating and i don't know how to capture.

PULUMI_FAQ

On Thu, Oct 26, 2017 at 10:18 AM, Luke Hoban notifications@github.com wrote:

One additional related issue is native modules. When development is done on Mac/Windows, these will not be valid ELF binaries and although they will be uploaded to the Lambda, they will not work.

This is a major known issue with Lambda, and we don't exactly make it worse, but we don't help (and we abstract away that it's Lambda, so it's not as easy for us to make the same excuses as Lambda does).

There are a few options:

  1. If our PPC took only source and ran "npm install" itself - then at least when using a PPC this would work (by us taking on the burden of being the CI)
  2. For local development, instead of directly uploading the folder, we could copy it (without node_modules), and then run npm install inside a Docker container. This would most likely introduce as many or more problems than it solves, but is probably the only way to get this working locally.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pulumi/pulumi-aws/issues/35#issuecomment-339736813, or mute the thread https://github.com/notifications/unsubscribe-auth/AH_5wlS2t6s9rjibfRM6GOs6mZ673maBks5swL7fgaJpZM4Pn_zM .

lindydonna commented 6 years ago

@ericrudder We could start a wiki with "known issues" and link to GitHub issues. Later, when we want to release, we can see which issues are still open.

mmdriley commented 6 years ago

This has been a huge pain point in the Learning Machine bringup because any change to the source tree invalidates all Lambda resources in the project.

joeduffy commented 6 years ago

I second @mmdriley's feedback. This adds about a minute of overhead to each LM publish, even when the lambdas weren't essential and certainly didn't change. Sadly, I have few suggestions on what to do here, because it's JavaScript and could require just about anything, other than requiring a developer to explicitly list what to include (or at least giving them the option to prune).

It'd be good to look at Up and Apex to see if they have any clever solution for this.

lindydonna commented 6 years ago

I like Luke's suggestion of having an exclude/include file similar to Serverless Framework: https://serverless.com/framework/docs/providers/aws/guide/packaging/

lukehoban commented 6 years ago

Up uses an .upignore: https://github.com/apex/up/blob/13108baf78dfe7238794e60dda5ea586ee9e672f/docs/04-configuration.md#ignoring-files and https://github.com/apex/up/blob/master/internal/zip/zip.go#L40-L45

So pretty similar to Serverless Framework.

I'm pretty convinced that is the right answer here too.

However, we have a challenge that there are two steps, both of which will eventually want a .pulumiignore:

  1. User machine -> PPC
  2. PPC -> Lambda

The use cases we're discussing are about (2).

I suggest we implement a .pulumiignore now to address (2). We can hold off longer on addressing (1) for pulumi/pulumi-service#122 and pulumi/pulumi-service#22 - and will presumably need to choose a different name for that.

lindydonna commented 6 years ago

Do we currently do any kind of webpack/browserify step before uploading the code?

lukehoban commented 6 years ago

Do we currently do any kind of webpack/browserify step before uploading the code?

No. Those almost always change semantics of require application-specific knowledge/configuration.

[from @joeduffy in private chat] I can imagine wanting to customize per lambda in some cases. For instance, perhaps I have one lambda that does indeed need the full sources in my dir, and 100 lambdas that do not (and so I want to avoid superfluously updating them). It's possible we tell people to just stick them into different subdirs.

I can imagine that too - but I think the 90% case of the pain here is assets which have no business going to Lambda at all. I think we should fix that. Neither Serverless Framework nor Up have yet addressed the customer-per-lambda case either.

Notably - the soltuion to per-lambda customization is probably going to be the same here as for other reasons we might want to customize it: https://github.com/pulumi/pulumi-cloud/issues/168

I think in a model where we allow an explicit cloud.Function(() => {}, { /* args */}) model, we can have one of the args be the specific files to include/exclude.

lindydonna commented 6 years ago

Given that we're already transforming the user's code, why not have an option to do a webpack? In Azure Functions, there's a preview tool that does this and it worked seamlessly for most customers. It had a big improvement on function size.

Agreed about the per-lambda customization case, which we can punt for now.

lukehoban commented 6 years ago

Given that we're already transforming the user's code, why not have an option to do a webpack?

Perhaps - depends on whether it's something we could reliably do on the users behalf. Do you have pointers on this? Or suggestions on how we could add something similar to our workflow (which is not quite 1:1 with direct Azure Functions user workflow).

lindydonna commented 6 years ago

@lukehoban There's probably a lot more complexity here that I don't entirely understand. :) I assumed we could do something similar to this Serverless Framework plugin: https://www.npmjs.com/package/serverless-webpack. It looks like there's some customization involved, so maybe that's a solution.

joeduffy commented 6 years ago

/cc @ellismg

There's definitely a deeper topic brewing here, which is whether we "build" code on behalf of the user and, if so, what extensibility mechanisms we offer. As we make progress on the service, I think we're going to have to do increasingly more of this, as @ellismg and I have discussed before.

In fact, our current cloud.Service Docker builds are arguably a perfect example of this already in action.

I do like, however, that architecturally these capabilities are layered distinctly. On one hand, the Docker builds bug me because they are the sole instance where we've violated this. On the other hand, I love it, because of its beautiful simplicity.

serverless-webpack is definitely very interesting to dig into. Interestingly, one of the first questions from the guy I spoke to today was whether we support webpack for lambdas.

lindydonna commented 6 years ago

FYI, it looks like Serverless Framework does have a feature for packing Lambdas individually, via the individually setting: https://serverless.com/framework/docs/providers/aws/guide/packaging#packaging-functions-separately

The serverless-webpack plugin recommends using that setting when using webpack (for obvious reasons).

joeduffy commented 6 years ago

Also /cc'ing @ellismg , since this ties in closely with https://github.com/pulumi/pulumi-service/issues/22