serverless / serverless

⚑ Serverless Framework – Effortlessly build apps that auto-scale, incur zero costs when idle, and require minimal maintenance using AWS Lambda and other managed cloud services.
https://serverless.com
MIT License
46.4k stars 5.7k forks source link

Platform Flag has no effect when running under Github Actions #10982

Open jcampbell05 opened 2 years ago

jcampbell05 commented 2 years ago

Are you certain it's a bug?

Is the issue caused by a plugin?

Are you using the latest version?

Is there an existing issue for this?

Issue description

When specifying a platform for my image it fails to have any effect on Github actions instead being built for linux/amd64 instead of my specified platform.

The image builds fine since it's a multi-arch base image but when ran it fails since the lambda is expecting a arm64 container and gets a amd64.

Service configuration (serverless.yml) content

provider:
  name: aws
  ecr:
    images:
      hdrimage:
        path: ./apps/hdr-merger
        platform: linux/arm64
  region: eu-west-1
functions:
  hdr-merge:
    architecture: arm64
    memorySize: 2048
    timeout: 900
    image:
      name: hdrimage

Command name and used flags

npx sls deploy --stage dev

Command output

N/A

Environment information

Framework Core: 3.14.0 (local)
Plugin: 6.2.1
SDK: 4.3.2
pgrzesik commented 2 years ago

Hello @jcampbell05 - the platform is appended as provided and passed to docker build command as --platform flag - could you share a bit more about the output and the final image that is produced? Does your runner actually support building for arm64?

jcampbell05 commented 2 years ago

So essentially it builds as normal without any issues, only throwing an error that it's the wrong format when ran on lambda or if I force docker to pull down the arm64 base image I get a format error during the build step and spitting out that the host doesn't match the docker base image platform and that "platform" flag wasn't passed in.

As far as I'm aware the default Github runner should support building arm images or it should at least be possible to build using buildx.

I did an experiment where I built the image using my own script and encountered the same issue so t seems like it could be an environment issue and not a problem with serverless framework.

jcampbell05 commented 2 years ago

Okay on inspection the issue was two things;

  1. Docker's Build X and QEMU need to be installed using their Github Actions. Perhaps a note in the guides could be added for those using serverless framework on linux machines as on Windows & Mac Docker Desktop already has this setup for you.
  2. Serverless Framework seems to use the docker build command rather than docker buildx build which from my testing so far ignores the platform flag as it's considered an "experimental" feature at least for Github Action's version of docker. So I had to write a script to manually build the image using buildx and push it to ECR. Ideally it owuld be possible to pass a flag to Serverless to indicate we want to use buildx.

I'm happy to add number 2 if it doesn't exist as a flag ?

jcampbell05 commented 2 years ago

So I have a fix in my fork for serverless. Essentially serverless checks to see if DOCKER_USE_BUILDX is set to a truthy value and if it is then it will use docker buildx build and if not it will continue to use docker build.

This allows it to be opt in for people in my case who need it but continue to use the stable build command for the majority of users.

If this sounds good I can open a PR @pgrzesik

pgrzesik commented 2 years ago

Hello @jcampbell05 - thanks a lot for providing more details and clarifications.

As for the solution - I'm honestly not sure about using an env variable instead of a setting in Serverless configuration e.g. under provider.ecr or specifically for each of the images.

@mnapoli @medikoo what is your opinion on that?

mnapoli commented 2 years ago
image

build --platform seems to be supported since a specific Docker version, is that correct? Does that mean that if you upgrade Docker, then things should work? (is that even possible on GitHub Actions?)

Basically I'm wondering whether we should add an option here. What would it take for things to work correctly by default?

Alternatively, could we (should we?) switch to buildx by default then?

pgrzesik commented 2 years ago

build --platform seems to be supported since a specific Docker version, is that correct? Does that mean that if you upgrade Docker, then things should work? (is that even possible on GitHub Actions?)

That would not be enough - because e.g. if your runner is amd64, then it still might not be possible to build arm64 images and vice versa.

Alternatively, could we (should we?) switch to buildx by default then?

I don't think so as it's a "plugin" that most people will probably not use at all so the command won't even be available.

mnapoli commented 2 years ago

That would not be enough - because e.g. if your runner is amd64, then it still might not be possible to build arm64 images and vice versa.

Wait, are you saying this is specifically for GitHub Actions, or are you saying that docker build does not support at all cross-platform builds? (I wonder what the option is for then)

I'm trying to sum up the problem entirely, before we add new options. I.e. in which scenarios specifically is the current feature limited.

jcampbell05 commented 2 years ago

So I've dug into this issue and looked into all of the Docker documentation and I think I've figured out the issue and solution. Unfortunately Docker hasn't consolidated it in one easy to understand place so I will link through out the comment to the useful resources.

Github Action's default Linux runner uses a AMD processor and Ubuntu 20.04 (https://github.com/actions/virtual-environments) which by default has Docker Moby 20.10.14 and Docker BuildX 0.8.2 installed (https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu2004-Readme.md). This version supports the --platform argument.

According to their documentation this platform for the docker client only requests from the docker server a particular architecture when pulling or building an image but doesn't guarantee that it will respect it if the server hasn't been configured to support it (https://docs.docker.com/engine/reference/commandline/build/).

By default Docker Desktop for Mac & Windows supports multiple architectures out of the box because it has been setup to ship with BuildX, QEMU and binfmt_misc to run binaries of any architecture (https://www.docker.com/blog/multi-arch-images/) as well as configured BuildX to be the default builder meaning "docker build" is actually an alias of "docker buildx build".

Linux Docker requires the user to configure it to support this and uses the standard build command by default which only builds images for the architecture of the host system and will ignore the platform flag.

There are three ways to configure it (https://docs.docker.com/buildx/working-with-buildx/) but the easiest way to configure it is as follows:

  1. Install Docker's BuildX plugin which supports building for other architectures.
  2. QEMU to allow the BuildX plugin to build for other architectures by running the binaries for foreign architectures in an emulator.
  3. Either update their tooling to use BuildX or set BuildX to be the default builder so all "docker build" commands are aliased to "docker buildx build"

By default BuildX is installed when Docker is installed via the Deb or RPM packages (https://docs.docker.com/buildx/working-with-buildx/#linux-packages).

From what I can see Github Actions has only achieved number 1 since they leave it up to the user how they want to achieve multi-platform builds. So the user has to perform steps 2 & 3 to build for other architectures themselves (It looks like it hasn't even been aliased).

Docker has helpfully provided Github Actions to install QEMU (https://github.com/docker/setup-qemu-action) which leaves the user to handle step number 3.

Since Serverless currently doesn't support being told to use BuildX we have to set it up as an alias by running "docker buildx install" which should solve the issue and avoid making any changes to the Serverless framework (https://docs.docker.com/engine/reference/commandline/buildx_install/)

However to make this a better developer experience and to work out of the box I would suggest the following tweaks:

  1. The Guides should be updated to explain all of this for those using Serverless to build containers for other platforms on linux. It's not trivial and having a guide with all of this information in one place would have saved me a lot of hours.
  2. When building a container image and I've specified that it should be used for a ARM lambda the Serverless Framework should identify this and either automatically select the correct platform or throw an error warning me that I need to set the platform for the image as ARM.
  3. When I set a platform for an image as a different one to the host then Serverless Framework would ideally detect that the conditions are met to be able to build for that architecture by doing the following:
    • Detect BuildX is installed.
    • Explicitly use BuildX (docker buildx build) to build the image so that the user doesn't need to alias it using docker buildx install. If it isn't installed then the user will get an error telling them to install it.
    • BuildX should hopefully verify at build time if it can build for that architecture since attempting to run a binary without QEMU will throw a format error. But potentially in these situations Serverless Framework could print a warning that it's detected that QEMU isn't available as a hint to the user.
  4. If it's the same platform or rather instruction set (Github Actions uses an AMD processor and AWS Lambda Intel but they both use x86) as the host then Serverless will use "docker build" as before to avoid needing to install BuildX.

Right now because Serverless does no validation and just uses docker build in all situations it fails to build containers for other platforms to the host when BuildX isn't configured to be the default builder and not configured correctly.

The above should require no additional environmental variables or changes to the Serverless configuration file. Instead improving the existing documentation and behaviour by the Framework.

pgrzesik commented 2 years ago

Thanks a lot @jcampbell05 for providing such an extensive overview πŸ™‡

  1. Having this documented would be amazing πŸ‘
  2. Could you clarify how that should work? How should the detection of proper architecture be implemented in your mind?
  3. What would be the exact steps to verify the installation with the least possible overhead?
mnapoli commented 2 years ago

+1, thanks for the details!

Having Serverless Framework automatically install buildx sounds useful right now, but I would be afraid it might be a liability over time? What if buildx is no longer needed in a few months from now (because Docker now support it natively)? Are we (Framework maintainers) in a capacity to 100% follow these changes and own the "buildx" specific logic? Those would be my questions so far.

That's why I'm wondering: would it make sense to not do anything automatically, but instead focus on guiding users through docs and actionable error messages? (which is part of what you suggest)

And +1 on the questions of @pgrzesik above

jcampbell05 commented 2 years ago

2. Could you clarify how that should work? How should the detection of proper architecture be implemented in your mind?

So we can get all the information about the host platform using docker version and it should spit out the host architecture.

Here is an example of what mine prints:

Client:
 Cloud integration: 1.0.17
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:20 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:10 2021
  OS/Arch:          linux/amd64
  Experimental:     false

We can simply detect if OS/Arch: for the Server matches the platform provided by the user in serverless.yml

3. What would be the exact steps to verify the installation with the least possible overhead?

For BuildX docker buildx version is all that would be needed to detect it's installed. If we get a version number then it's installed but if we get a "buildx is not a command" error then it's not installed.

For QEMU we can detect if qemu-img is a valid command using "which qemu-img" as this is a tool installed by QEMU for detecting image formats.

Having Serverless Framework automatically install buildx sounds useful right now, but I would be afraid it might be a liability over time? What if buildx is no longer needed in a few months from now (because Docker now support it natively)? Are we (Framework maintainers) in a capacity to 100% follow these changes and own the "buildx" specific logic? Those would be my questions so far.

So it wouldn't be installed by Serverless but rather detect it and used if available.

It's not clear yet if Docker ever intends to merge it into the main build command as it also as features such as targeting a remote runner to build a container which might make the out of the box build command a bit heavy. It appears that their intention is for buildx to become an "Advanced" builder.

I would certainly be surprised if it needed any special logic in the future as the fact you can alias to it shows to me that they intend for it to be a drop in replacement for existing scripts. So I would expect for them to provide backwards compatible upgrade path we can utilise.

But If we are afraid it might become a liability then just updating the documentation and throwing an error messages when the above conditions are met to the user would be a big improvement on what we have now.

But I don't see that being a huge reduction in liability for the framework developers as you would still need to update the logic for when the error message should be displayed and so potentially there should be a way to skip this error message in case it no longer applies.

On the other hand that might be a premature optimisation and Docker should hopefully provide docker buildx for backwards compatibility if they do one day plan to merge it.

So with all the above considered perhaps it would just be simpler to either only update the docs or in addition add a use_buildx option ?

pgrzesik commented 2 years ago

Hello @jcampbell05 - I think we should keep the Framework inference into the build process minimal and potentially just detect if the host architecture is different and present a warning to user about that - I would be hesitant to throwing any errors or more extensive intervention as it might be intended by user and already handled by e.g. aliasing docker bulild with buildx. That information should provide potential guidance on how to address the issue if at a later point, the user would run into architecture mismatch

jcampbell05 commented 2 years ago

Sounds good, so to summarise:

  1. Update Documentation to explain the above.
  2. Add a warning on build which indicates when build platform and host platform are different with a link to the new documentation.
pgrzesik commented 2 years ago

Hey @jcampbell05 - sorry for not responding sooner, I was off this past week. That summary sounds good to me, what do you think @mnapoli ?

mnapoli commented 2 years ago

Yes that's good to me.

jcampbell05 commented 1 year ago

So following up on this I've found that the way BuildX can be configured to build multi-architecture containers can prevent images automatically being available for Serverless to be able to tag and push them.

When using the standard docker builder this automatically happens which is what serverless currently relies on. But if using Build X's docker container, kubenetes or any other build plugin it does not do this unless you pass the --load parameter. (https://docs.docker.com/build/building/drivers/docker-container/#loading-to-local-image-store)

I had to alias the "docker" command to a custom script which injects this flag since serverless fails to use this flag and there currently is no way to pass your own command line flags for the build command

This script is below

#!/bin/sh
#
# Note:
#
# A Docker shim to workaround fact Serverless doensn't specify "--load" flag to automatically load images into image list when build using 
# another Build X builder plugin other than the default docker builder nor provide anyway for us to provide our own flags. 
#
# See more here https://github.com/serverless/serverless/issues/10982
#
# This shim simply calls the original docker with the flags serverless originally provided alongside our additional flags to fix the building
# issue and any other Github Action optimizations.
#
if [ $1 = "build" ]
then
    /usr/bin/docker $@ --load
else
    /usr/bin/docker $@
fi
rajp33 commented 1 year ago

Hey @jcampbell05 Can you show me how you used that script as an alias in github actions? I have been trying for a while and it doesn't seem like the alias for docker is being carried over to the serverless process.

Edit:

Nevermind figured it out! Ended up copying the shim to a place in PATH before /usr/bin:

copy docker_shim.sh ~/.local/bin/docker

jcampbell05 commented 1 year ago

@rajp33 I used the Github Actions script to add the docker script to the PATH environment variable so that all docker commands invoke it.