Don't build from scratch each time

hausdorff commented 5 years ago

From a user:

don’t run docker build at all, if nothing in a dockerfile’s context has changed. Reason: Even if all layers are cached, docker build actually takes a couple of seconds, that’s why skaffold for example avoids repeating that step.

hausdorff commented 5 years ago

I've put @lukehoban on this and put it in M18, but those should be thought of as suggestions. :)

geekflyer commented 5 years ago

Adding a bit more context on why this is an (big) issue imho:

I'm working on making pulumi our go-to tool for all kind of k8s build and deployment. I opted to create a pulumi stack in each repo / app that should be deployed to k8s. Each stack then contains the docker build AND k8s manifests expressed as a single pulumi program. As a result all of our pulumi stacks have at least 1 dockerfile, one of them actually has 10 dockerfiles.

The issue above causes multiple problems which are detrimental to the pulumi UX, especially in the repo with 10 dockerfiles:

pulumi preview and pulumi up get very slow. This is particularly a problem for pulumi preview which users expect to be kinda fast. The slowness is caused by the slowness of docker build itself:
- there's a fixed overhead of every docker command
- when using cacheFrom: true pulumi always runs a docker pull <imagename> even during preview, which causes docker attempting to download all layers. It takes a few seconds for docker to conclude that it has all the layers already locally.
- the aforementioned step is even slower when using google GCR with the gcr credential helper which adds a small delay in authenticating with a registry on each pull
- docker build with all layers cached takes still a few milliseconds / seconds per layer, especially if the layers are large.

Even when all layers are cached, the diagnostics section in pulumi produces 3 lines per cached layer to inform the user that the build step was taken from the cache. In the case of our repo with 10 dockerfiles, with ~11 layers each it means we get more than 330 lines of noisy, not particularly useful diagnostics log output on every preview. The diagnostics output contains hundreds of lines which look like this:

docker:image:Image: foo-bar
info: Sending build context to Docker daemon  125.6kB
Step 1/11 : FROM node:8
 ---> 6f62c0cdc461
Step 2/11 : ARG NPM_TOKEN
 ---> Using cache
 ---> c0801f54dc29
Step 3/11 : WORKDIR /app
 ---> Using cache
 ---> c53511307e50
Step 4/11 : COPY package.json package-lock.json ./
 ---> Using cache
 ---> 7313d20117f9
Step 5/11 : RUN echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" >> ~/.npmrc && npm install
 ---> Using cache
 ---> 6024564eef28
Step 6/11 : ENV TS_NODE_TRANSPILE_ONLY=true     APP__NOCLUSTER=true
 ---> Using cache
 ---> 920a379cc6e3
Step 7/11 : COPY package.json package-lock.json* tsconfig.json* ./
 ---> Using cache
 ---> 2c52eae743bf
Step 8/11 : COPY conf* config* conf/
 ---> Using cache
 ---> b4c2f9ff0437
Step 9/11 : COPY conf* config* config/
 ---> Using cache
 ---> 51f8ada01947
Step 10/11 : COPY src src
 ---> Using cache
 ---> e3056cb71320
Step 11/11 : ENTRYPOINT npm start
 ---> Using cache
 ---> c6bc04e903dc
Successfully built c6bc04e903dc
Successfully tagged gcr.io/foo/bar:latest

pulumi up is even slower because the docker push step also takes a couple of seconds until docker concludes that all layers are already present in the remote registry.

Altogether this single issue has a large negative impact on the pulumi docker / k8s UX and should be prioritized accordingly.
To be clear: I really love pulumi, but personally I consider this particular issue the biggest meh of pulumi compared to other k8s dev tools. Tools which do a much better job at this (i.e. they avoid rebuilding dockerfiles everytime) include: skaffold, forge.sh and possibly (haven't tested myself): draft, devspace. Those tools have certainly a different (and smaller) scope than pulumi, but it'd like to avoid adding additional tools to the mix if pulumi can be tweaked to be fast enough for the inner dev cycle by itself.

Side Note: In a perfect world docker itself would be a bit smarter and faster about this, but I don't think this is gonna happen anytime soon. Docker 18.06 brings experimental support for BuildKit as next-gen docker build implementation which goes in the right direction but there's still a lot of things to solve (i.e. faster / more intelligent remote caching, instead of explicit docker pull etc.).

geekflyer commented 5 years ago

I'm also pasting here an extract of a private conversation I had with @hausdorff a few days ago from which this issue originated:

geekflyer wrote:

While I think pulumi is good in terms of feedback loop, I don’t that this is really pulumi’s strength as of now. From my perspective pulumi as of now has a reliable deployment model with automatic built-in feedback, that is very suited for automatic deployments used by CI / CD etc. But for local development / iteration with quick deployment to k8s I think there are better tools out there. To be concrete: Skaffold, forge.sh and Draft. We currently actually use forge.sh for the few k8s apps we deploy in production on k8s. forge.sh doesn’t have a semantic understanding of k8s objects, but it has a built in docker build+push > deterministic auto-image tagging > injecting tags in yaml > deploy to k8s workflow. Usually I use it then in conjunction with kail to see if a pod is up an running and what it’s log says. Skaffold is similar to that but even better in some aspects:

Skaffold has somewhat of an understanding of the k8s artifacts and in return doesn’t necessarily redeploy everything if you just changed a single manifest and it also ensures that pods are starting successfully and shows their logs.

Skaffold has a built-in file watch mode which auto redeploys on any file change

In general the feedback loop with skaffold is really short, but it’s by far not as flexible as pulumi.

I think some things where pulumi could improve - inspired by skaffold would be:

some sort of watch mode

e2e docker build > deploy workflow with automatic image tag management (I can already build and deploy with pulumi in one workflow right now, but the image tag management isn’t been taken care of automatically yet)

don’t run docker build at all, if nothing in a dockerfile’s context has changed. Reason: Even if all layers are cached, docker build actually takes a couple of seconds, that’s why skaffold for example avoids repeating that step.

@hausdorff already clarified that 2. is already being partially taken care of by using the imageName output of docker.Image which contains the image sha256.

joeduffy commented 5 years ago

Related (just a subset but interesting): https://github.com/pulumi/pulumi-cloud/issues/183

As someone who frequently demos the Docker build support, often with awkward pauses in the middle, I am a huge fan of pursuing these optimizations :smile: 👍

geekflyer commented 5 years ago

relates to https://github.com/pulumi/pulumi/issues/2052

CyrusNajmabadi commented 5 years ago

Moving out. We don't have a plan for this for m18. One thing we are considering (and @hausdorff to fill in more details) is that we may move to a provider-model for this where we can call into the docker APis directly. As part of that, we're considring looking into an opt-in/out model where we can have a behavior that tries to infer if a change will happen or not just by examining the local file system. In other words, if the local file system is unchanged, you will be able to either opt-into (or out of) a behavior where we assume that means that a docker build won't produce anything different.

This has to be something under user control though as this is simply a weak approximation. A docker build may always end up producing a new image, even if nothing on disk changed.

With this, a user could then decide between saying "i always want docker to run, to get the most accurate results" vs "i'm ok assuming that running docker will not change anything if all my files on disk are ok".

With this approach, the cost then just comes down to check for changes on disk. In general, this is something we do a good enough job with, as we already have to do that for any sort of update, given that that's how we do things like determine what to upload with an AWS Lambda. This should hopefully be less work overall than what docker does, resulting in a boost for this scenario.

CyrusNajmabadi commented 5 years ago

Giving to @hausdorff as he felt he had the best understanding of what to do here, esp. with his existing understanding of how to create a custom provider.

CyrusNajmabadi commented 5 years ago

We're going to take on the docker work in m20.

xtellurian commented 4 years ago

Even the ability to use the cacheFrom arg would be helpful in the docker.Image class.

It seems like we currently have to choose between the more idiomatic docker.Image resource to the function docker.buildAndPushImage function

coffenbacher commented 4 years ago

Improvement on this front would be really cool :+1: I keep circling back to try Pulumi and every time I do I run into this issue, and remember why I gave up on Pulumi before :smile: Hopefully this improves at some point.

MCGPPeters commented 3 years ago

Any updates on this? Building images is painfully slow, while the rest of the functionality we are evaluating is performing well enough. Building an certain image, that consists out off a number of large layers that rarely change, using the docker command line takes about 2 minutes :

In Pulumi it consistently takes more than 12 !:

jamesmeneghello commented 3 years ago

Also a real problem for us - it's generally taking about 15-20 minutes for pulumi up to build our Docker images locally, where just doing a docker build takes 2-3.

leezen commented 3 years ago

I forgot to cross-post in here, but one option is to use the RegistryImage resource (see https://github.com/pulumi/pulumi-docker/issues/132#issuecomment-812234110 for more details on using it as a replacement for Image)

jamesmeneghello commented 3 years ago

There's also a use case where it'd be great to pull an image to populate the cache from a different registry than where it's ending up. As an example:

CI Pipeline builds Docker images and pushes to registry (GitLab)
CI Pipeline engages Pulumi up
Pulumi pulls images from GitLab registry, builds, pushes to ECR for use in AWS-based k8s.

From what I can tell, this is not currently possible? cacheFrom only seems to work with the destination registry.

AaronFriel commented 1 year ago

@geekflyer et al., this issue is resolved with the new implementation of the Docker Image resource in v4! See our blog post for more info: https://www.pulumi.com/blog/build-images-50x-faster-docker-v4/

pulumi / pulumi-docker

Don't build from scratch each time #14