pimterry / lambda-git

A git binary installed through NPM, for use with AWS Lambda
MIT License
78 stars 13 forks source link

Breaking changes in AWS Linux 2 / Lambda nodejs10.x #14

Open ryanblock opened 5 years ago

ryanblock commented 5 years ago

See also: https://github.com/lambci/git-lambda-layer/issues/13

Changes in Lambda nodejs10.x appear to break this library; specifically, its build of git, which is based on lambci/git-lambda-layer's build (hence the issue above).

I'll continue to monitor progress on the next git distribution for the AWS Linux 2 image that Lambda nodejs10.x uses.

Related, I'm thinking said fix (whether it originates from lambci/git-lambda-layer, or here) may be occasion for a 1.0 bump – open to thoughts on that!

Edit: misunderstood rollout plan/schedule of AWS Linux 2 – while it's a breaking change for this library in Lambda nodejs10.x, it's not going to be forced on non-nodejs10.x users (yet, anyway!).

pimterry commented 5 years ago

I'm thinking said fix (whether it originates from lambci/git-lambda-layer, or here) may be occasion for a 1.0 bump – open to thoughts on that!

Definitely makes sense to me :+1:. There's no specific reason we're not on 1.0 yet, just an initial abundance of caution, bumping sounds like the right idea.

ryanblock commented 5 years ago

I think I partially misunderstood AWS's Lambda container rollout plan, which can be summarized as:

What's not yet clear is whether a git package that bundles all its own dependencies will be compatible with both container environments (and assuming it was, whether it would be a tenable file size).

So, this begs a question of direction. Since there's now effectively a fork in Lambda environments, do we maintain a version for each?

I could see doing a minor bump right away with the latest git currently available in lambci/git-lambda-layer (2.20.0), putting pre 1.0 into bug fixes / maintenance mode; then bumping major for AWS Linux 2 compatibility, due to its a breaking changes for nodejs10.x.

That's tidy, but could be frustrating for anyone installing lambda-git@latest that isn't on nodejs10.x (or future AWS Linux 2 Lambda runtimes).

Fwiw, sounded a bit like mhart was going to build an entirely new layer to solve for this, but layers are just ARNs, so they don't really have the NPM (and soon GH Registry) namespace considerations of this module.

pimterry commented 5 years ago

Can we easily detect the AWS Linux version in use from Node at runtime? By the sound of it, for now the Node version is a clear marker, though it's unclear if that'll always be true.

Given we can, I'd split the package by versions as you suggest, but leave a check in each that warns and exits hard if it's run in the wrong environment, and then leave the npm latest tag pointing at the one we think is most popular for now. And document all that of course.

E.g, assuming in the short-term that AWS Linux 1 is still the common use case, I'd publish:

with the latest dist tag pointing to 1.0.0 for now, presumably, so that's what's installed by default.

At some point later on we can finalize 2.0.0 if everything looks good, and start pointing latest to that once we think most people are using AWS Linux 2 (I suspect you have a far better feel for that than me).

How does that sound? Potentially frustrating for anybody on the wrong side of our latest guess if they don't read the docs, but if they don't read the docs that's always going to be true, and at least clearly printing an error at startup can make it very easy to spot & resolve.

ryanblock commented 5 years ago

News: Node 10 git binary is a go! https://github.com/lambci/git-lambda-layer/issues/13#issuecomment-547484502

What with node8.10 about to go EOL on Lambda, I'm thinking there is no longer much point to updating the legacy version or splitting the package / providing backwards support – within a few months those Lambdas won't be able to be updated anyway.

Seems like the move here is to get the new git image into 2.0.0 RC for nodejs10.x and go from there. Thoughts?

pimterry commented 5 years ago

What with node8.10 about to go EOL on Lambda, I'm thinking there is no longer much point to updating the legacy version

Just for my reference later, some dates from https://docs.aws.amazon.com/lambda/latest/dg/runtime-support-policy.html:

All sounds good - the hard deprecation here makes a convincing case to me too for just focusing on Node 10 as the default asap, along with a simultaneous bump to v1.0.0.

Definitely keeps things simpler too, works for me :+1:

ryanblock commented 5 years ago

Right on. Will get a PR cooking for this this week!

ryanblock commented 5 years ago

Ok! Got a PR going for 2.0.0 here (I just assumed to go to 2.0.0 mainly because I didn't take the time to figure out whether 0.x (which is unstable in semverland) → 1.x would be considered a major the same as 1.x → 2.x). I added optional callback support, improved error handling, a new test suite, GitHub Actions CI config, and, of course, updated git for nodejs10x.

Unfortunately it's not good news. I probably should have tested lambci/git-lambda-layer's layer binaries before I got as far as I did with this PR, because there appear to be some blockers that I'm not so sure we can get around.

So, a quick bit of background: the nodejs8.10's Lambda microcontainer image was/is about 1.2GB, and contained a LOT of what we think of as being "standard" Linux stuff in there. Binaries, tools, libs, etc.

When AWS released nodejs10.x, they took a pretty different approach to the image, and made it super lean. This meant removing a ton of shared Linux libraries, binaries, and stripping it down to the bone – and they got the whole thing down to about 400MB (which is seriously impressive weight loss, imo!).

This is (in theory) a major boon to cold start times and other operations, but unfortunately it also meant that any custom binaries could no longer rely on standard system libraries. No bigs, being the mensch that @mhart is, he built a lot of great tooling (like docker-lambda and more recently yumda) to make it easier to compile binaries for Lambda targets.

However, the assumptions of extracting binaries to the filesystem via userland / Node modules vs via Lambda layers are different. Runtimes do their userland work in Lambda within /tmp, while Layers are contained within /opt. When I extracted the nodejs10.x git package and began testing in real Lambdae, git commands beyond --version were failing to find libs.

Looking into some of the binaries, it appears they may be calling libs from an absolute path that assumed a layer (/opt), and not relatively to their location in the filesystem (which, although it's been a while since I've done Linux lib hacking, I believe is a totally normal phenomena – but please correct me!).

This wasn't an issue before because all the common libs needed to run these binaries existed on the filesystem in nodejs8.10.

Just for completeness, I was curious to see if we could extract to /opt, but it is not userland editable, so this module can't write its contents there:

Uncaught Exception  
{
    "errorType": "Error",
    "errorMessage": "EROFS: read-only file system, open '/opt/._bin'",
    "code": "EROFS",
    "errno": -30,
    "syscall": "open",
    "path": "/opt/._bin",
    "stack": [
        "Error: EROFS: read-only file system, open '/opt/._bin'"
    ]
}

Unless anyone can think of other approaches, it seems like that means we'd need to take on compiling git just for this module in order to make it work. In terms of level of effort vs. wheel reinvention vs. other subject considerations (related to my own use cases), I don't think I'd recommend that, and don't personally want to take on that workload.

I'm a bit bummed out, really thought I could make this happen while still drafting on the good work of the lambci/git-lambda-layer project, but I think the way nodejs10.x works, and the fact that node8.10 is nearly EOL together puts this project at an impasse.

@pimterry wdyt?

pimterry commented 5 years ago

Interesting! Yeah, that sounds like a pain. Can you paste the errors you're seeing here, and the output of ldd $thegitbinary after extraction? That should highlight the issue a little more clearly.

Looking into some of the binaries, it appears they may be calling libs from an absolute path that assumed a layer (/opt), and not relatively to their location in the filesystem (which, although it's been a while since I've done Linux lib hacking, I believe is a totally normal phenomena – but please correct me!).

I think absolute paths are common, but by no means required. You can have relative paths (although they're relative to your working directory, not the binary) or bare names (which will be looked up using LD_LIBRARY_PATH etc). There should be a route through with those.

Unless anyone can think of other approaches

You are right: the 'correct' solution is probably to set up our own pipeline to compile Git for Lambda, for userland instead of layerland, but I agree that's a hassle.

I think there are other options though. For example, a quick google turns up this fun SO question: https://unix.stackexchange.com/questions/448789/fix-hardcoded-dynamic-linking-for-executable.

They're resolving a very similar issue using patchelf, which lets you to rewrite the paths in the binary directly. I think this should look something like:

In theory we could do that fairly easily, although it might just expose a new problem. I don't really have time to dig into this myself right now, but if you've got a minute, do you want to give it a go?

dorian-coygo commented 5 years ago

You guys got this! 👀 this thread 👍

mhart commented 5 years ago

Just a reminder for anyone lurking here that doesn't know – you can get git working in your Lambda by just adding a layer: https://github.com/lambci/git-lambda-layer

If that's not an option for you, then I'd understand why you'd want to continue using this module, but just figured I'd chime in to make sure ppl know.

dorian-coygo commented 5 years ago

Thanks for that @mhart - I will check it out. The layer is new right? (~17 days ago the support was added). I think that is why I used this one a while back. We don't have any restrictions for adding the layer in on our side (just need to spend a few minutes to update the cloudformation). May be a good idea to add that to the README for folks on that repo.

ryanblock commented 5 years ago

@mhart for sure! At the time I bugged @pimterry to undeprecate this package we couldn’t yet make use of layers.

@pimterry re patching the elf and repackaging the binaries, sounds p straightforward and easy to automate (assuming it works), but I’m also tapped out on time right now. Not sure when I’ll be able to get to it, so if someone wants to pick up where I left off on the 2.0 branch (or if you opt to re-deprecate this package) either is fine imo!

mhart commented 5 years ago

@dorian-coygo the layer that supports git on all runtimes except nodejs10.x has been around for a year, but you're right that nodejs10.x support is only 17 days old. Not sure what you mean about adding to the README?

dorian-coygo commented 5 years ago

@mhart - I went down the rabbit hole this morning. @ryanblock is right about how slim AWS Linux 2 is. It doesn't have curl or even tar. I tried forking the layer and adding curl and publishing it to my AWS account, but it looks like there are more issues and headaches (I get a curl: error while loading shared libraries: /opt/lib/libcurl.so.4: file too short). So then I decided to just use the request package and ditch curl. That is when I ran into the tar command not found. Solved that by adding it to the layer, but now I am getting random tar: Error is not recoverable: exiting now (which requires me to either debug my simple request code or some issue with tar.) There isn't a lot of documentation around AWS Linux 2 and not all runtimes use it.

I may just wait to see if this package has more success or just write my lambda in a different language that still uses AWS Linux image.

As for the README, you explain how to use the AWS Layer via the UI. You could add a snippet about how to add it to the CloudFormation such as

Resources:
  <name of lambda>:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: <location of lambda code>
      Layers:
        - !Sub arn:aws:lambda:us-east-1:553035198032:layer:git-lambda2:2
mhart commented 5 years ago

@dorian-coygo if you want any of those binaries (curl, tar, etc), you can use https://github.com/lambci/yumda

Are you having issues with the git-lambda2 layer? If so, feel free to post issues over at that repo

mhart commented 5 years ago

Btw, in terms of explaining how to use it in the UI, I have exactly an example of that in the README here: https://github.com/lambci/git-lambda-layer#getting-started

dorian-coygo commented 4 years ago

Just wanted to give an update that I also moved to layers solution and used the git-lambda-layer as a guide (since it didn't have all the dependencies I needed).

Since AWS slimmed down AWS Linux 2, I don't think the lambda-git would be good enough for most people since you will probably have to build a layer of dependencies anyways.

Steps for others:

  1. Figure out all your dependencies (this was painful trial and error). In my case it was git, curl, tar, gzip
  2. Follow @mhart great guide here: https://github.com/lambci/yumda to build all of them in a docker and then zip it up.
  3. If you want to publish the layer for others, you can follow @mhart publish script
dorian-coygo commented 4 years ago

Btw, in terms of explaining how to use it in the UI, I have exactly an example of that in the README here: https://github.com/lambci/git-lambda-layer#getting-started

FWIW: I was referring to adding a CloudFormation snippet in the readme (in case someone didn't want to do it manually in the UI).