microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.5k stars 2.61k forks source link

[Feature request] Docker build should use cache unless asked not to #6439

Closed MirzaSikander closed 5 years ago

MirzaSikander commented 6 years ago

Environment

Issue Description

Locally, if you manually build docker containers over and over again with small changes, the docker daemon uses cache to speed up docker build.

On VSTS, every time a build is started, the docker build goes through the same steps that it has gone over previously: the base images are fetched and expanded, apt-get installs all the dependencies, files are copied over, etc. This adds a lot of time for each build; In fact the majority of build time for our CI/CD is wasted in this.

Expectation: If the host being used is the same for each build, it should make use of the cache from previous builds to speed up this step, unless specifically not asked to.

polys commented 6 years ago

Could you at least cache base images from public repositories as a first step?

It's such a waste of time downloading images like microsoft/aspnetcore, microsoft/aspnetcore-build, node, python, nginx etc. again and again.

borkke commented 6 years ago

+1 for this one.

Does anyone know is it the same on TFS?

polys commented 6 years ago

On Hosted VS2017 agents, some container images are already cached apparently: https://github.com/Microsoft/vsts-image-generation/blob/master/images/win/Vs2017-Server2016-Readme.md#docker-images

linuxchata commented 6 years ago

Is there information about container images cached on Hosted Linux Preview agents?

chrispat commented 6 years ago

The host is not the same for each build, it is a fresh host each time for security and consistency reasons. We do precache some of those images on the host machines. We can look at caching others but that cache will always drift from what is currently published simply due to how often the pool is updated. So it is possible that you build a docker container with a node base and you get a different sha256 base when you build in CI from what you get when you build locally.

bryanmacfarlane commented 6 years ago

note that we are going to release a new Ubuntu (released) hosted pool and we're investigating caching a series of more focused containers (node, dotnet core etc...). We may keep a couple tags of each. But as @chrisrpatterson noted, you're cache hits will be hit or miss for a number of factors. We're experimenting now.

This issue should move to vsts-image-generation repo since that's where the hosted VM gen and the new docker images work will happen.

andreacassioli commented 6 years ago

This is quite an issue, especially when using CI/CD intensively and with base images such as ASP.NET SDK or the like (unfortunately even dotnet core 2.1 seems to have a very large sdk image).

@chrisrpatterson : the cache will clearly drift, that's in it's own nature. The point is that as long as a new image is not published, the one I am using will be cached as the SHA will be the same (notice that people should pick specific version, not just latest!).

andreacassioli commented 6 years ago

I have noticed that the docker build process always delete intermediate containers. Why is that? It is not the default Docker behaviour. We have some use cases in which we would like to split a multi-stage build in multiple tasks, but right know there is no way to cache one stage to the second.

Notice that that does not happen running docker build fro ma bash task.

harshil93 commented 6 years ago

@andreacassioli Not an expert on docker but the default option in docker build is to remove intermediate containers. Do you want to keep the containers after docker build ?

https://docs.docker.com/engine/reference/commandline/build/#options

See the --rm

andreujuanc commented 6 years ago

I'd love this! Even just for testing purposes, i'm trying different settings while configuring my pipeline and if every queue gonna use 5 minutes of my free build time, then i'll run out of minutes in less than a day :(

matt-mahdieh commented 6 years ago

Any progress on this issue? I have been having the same issue on VSTS. Every time trigger a new build it downloads the images (in my case .NET SDK) which is huge. Is there any way to cache same as what is available in Bitbucket cloud?

mattnedrich commented 6 years ago

Any update on this?

mattquinn commented 5 years ago

I have a customer interested in this feature. They understand running your own private build agent can solve this, but the tradeoff of then having to manage that infrastructure isn't compelling.

The idea of being able to leverage the cache in a hosted build agent is compelling - quicker builds, without managing infrastructure - and could help get their builds down from tens of minutes to ~1-2 mins, which on their CI/CD "always deploying" plans would be a massive win.

AndreyBespamyatnov commented 5 years ago

Have the same issue with Downloading MSSQL image every time, takes ~20 min. Thanks

p10tyr commented 5 years ago

And now with limited 1 Parallel Builds a simple app takes minutes to build in Docker CI because it has to download everything every time ... Pleeeeaaaseeeee! I don't want to wait hours for a build to complete and release... We already pay for extra parallels but this is a bit out of hand now.

nelisbijl commented 5 years ago

I have a customer interested in this feature. They understand running your own private build agent can solve this, but the tradeoff of then having to manage that infrastructure isn't compelling.

The idea of being able to leverage the cache in a hosted build agent is compelling - quicker builds, without managing infrastructure - and could help get their builds down from tens of minutes to ~1-2 mins, which on their CI/CD "always deploying" plans would be a massive win.

I managed to create a self-hosted agent from https://github.com/Microsoft/azure-pipelines-image-generation using packer. Although the documentation fails to explain how to create a VM from the generated vhd and template file, I got one up and running, installed the agent software and configured it.

However, this agent still fails to cache the costly 'npm install' step in my Dockerfile. If I run the same docker build command on that machine in the folder created by the pipeline agent, it does use the cached docker image!! Something must be different between using the commandline and using the azure pipeline agent but I haven't found it yet

UPDATE Found that it is caused by the Job's option 'Allow scripts to access the OAuth token' It's value is passed to docker build using a --build-arg ACCESS_TOKEN=$(System.AccessToken) After the Dockerfile statement: ARG ACCESS_TOKEN Every RUN command is re-executed and no longer uses cache:

2018-12-15T12:35:24.1151714Z Step 2/6 : COPY .npmrc .npmrc 2018-12-15T12:35:24.1169984Z ---> Using cache 2018-12-15T12:35:24.1170265Z ---> 0d5e206a8336 2018-12-15T12:35:24.1170385Z Step 3/6 : COPY package*.json ./ 2018-12-15T12:35:24.1180777Z ---> Using cache 2018-12-15T12:35:24.1181102Z ---> 21eb833bb3c8 2018-12-15T12:35:24.1181370Z Step 4/6 : RUN ls -altrR . 2018-12-15T12:35:24.1186686Z ---> Using cache 2018-12-15T12:35:24.1187008Z ---> 895655fcdf21 2018-12-15T12:35:24.1187084Z Step 5/6 : ARG ACCESS_TOKEN 2018-12-15T12:35:24.1193136Z ---> Using cache 2018-12-15T12:35:24.1193388Z ---> a3a4895ab808 2018-12-15T12:35:24.1193684Z Step 6/6 : RUN ls -altrR . 2018-12-15T12:35:24.3915668Z ---> Running in 52887c15fe39 2018-12-15T12:35:25.4198528Z .: 2018-12-15T12:35:25.4198815Z total 384 2018-12-15T12:35:25.4199606Z drwxr-xr-x 1 root root 4096 Nov 20 08:30 .. 2018-12-15T12:35:25.4200313Z -rw-r--r-- 1 root root 1401 Dec 15 12:24 package.json 2018-12-15T12:35:25.4200657Z -rw-r--r-- 1 root root 373122 Dec 15 12:24 package-lock.json 2018-12-15T12:35:25.4201062Z -rw-r--r-- 1 root root 309 Dec 15 12:24 .npmrc 2018-12-15T12:35:25.4201349Z drwxr-xr-x 1 root root 4096 Dec 15 12:25 .

The second ls statement is re-executed! Note that the $ACCESS_TOKEN is not used anywhere.

FYI: I use the access token in my .npmrc file to access our private npm registry

Don't know whether System.AccessToken is different on every run. it echos as ***, or that it;s value can not be cached by Docker. Specifying any other value (e.g. --build-arg ACCESS_TOKEN=AT) makes the caching work as expected.

Anyone any idea?

UPDATE 2

Guess what? There is a difference between:

docker build . --build-arg ACCESS_TOKEN=$(System.AccessToken) ...

and

ACCESS_TOKEN=$(System.AccessToken) docker build . --build-arg ACCESS_TOKEN=$ACCESS_TOKEN ...

The second has no problem using docker's cache after the ARG ACCESS_TOKEN statement. Great performance improvement!

UPDATE 3 After the holidays I came to the conclusion that what I stated in UPDATE 2 is not correct.

You can not specify an environment variable at the start of a commandline. You have to use the dedicated environment section for that (either in designer or yaml). Otherwise the environment variable will simply not be set (empty string value). The reason it worked for me was that the ACCESS_TOKEN build arg wasn't actually used.

As $(System.AccessToken) seems to be different on every run, you should not use it in your Docker build if you want to enjoy the use of caching.

However, I somehow needed to authenticate my private npm registry. You can use an NpmAuthenticate step for that. This updates the .npmrc files with an authentication token. You have to specify credentials. When left blank the OAuth authentication of the runner will be used which results in a different .npmrc file on every run. Instead you should use a npm service connection. The drawback is that it needs a PAT (private access token) that can be valid for no longer than 12 months. So you will have to remember to update that. The good news is that it results in a constant .npmrc, so docker can and will use it's cache

willemodendaal commented 5 years ago

Hello from 2019. I guess this is still an issue... pity. Would really like to use VSTS hosted agents, but not being able to cache container images makes the build super slow. 20+ minutes instead of 1 minute.

For reference, this is what my dockerfile looks like. Nothing fancy.

FROM microsoft/aspnet:4.7.2-windowsservercore-ltsc2016
ARG source
WORKDIR /inetpub/wwwroot
COPY . .
cpumanaz commented 5 years ago

I am experiencing the same. using the "FROM microsoft/aspnet:4.7.2-windowsservercore-ltsc2016" source image. It was mentioned not every hash is cached and there is drift. Is it documented which specific tag(s) are cached on the 'Hosted VS2017' agent so we can choose those source images?

This is advertised as a feature at the link below, but I do not see it working as advertised. https://docs.microsoft.com/en-us/azure/devops/pipelines/languages/docker

cpumanaz commented 5 years ago

I'm reporting back. The reason that many might have issues is using :latest If you were taught using the latest tag was bad then you will not see the performance gains. By adding a docker image list task, I saw which images were pre-loaded.

==============================================================================
Task         : Docker
Description  : Build, tag, push, or run Docker images, or run a Docker command. Task can be used with Docker or Azure Container registry.
Version      : 1.1.27
Author       : Microsoft Corporation
Help         : [More Information](https://go.microsoft.com/fwlink/?linkid=848006)
==============================================================================
[command]"C:\Program Files\Docker\docker.exe" image list
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
microsoft/dotnet-framework    latest              ec599075a73c        7 weeks ago         13.2GB
microsoft/windowsservercore   latest              ea9f7aa13d03        7 weeks ago         11GB
microsoft/aspnet              latest              ddabd3e10c02        3 months ago        13.7GB
microsoft/nanoserver          latest              4c872414bf9d        4 months ago        1.17GB
microsoft/aspnetcore-build    1.0-2.0             5d8be0910d37        6 months ago        3.99GB
andreujuanc commented 5 years ago

That's interesting. At least that's a good start. Problem is that many depend on specific versios, and that might be still an issue :(

sebnyberg commented 5 years ago

I keep coming back to this thread to check whether anything is happening on the topic.

As our CI and release pipelines have turned into a sequence of docker and helm commands, the only long running issue is this one. Spending (wasting) almost 90% of build time downloading images and packages due to lack of support for layer caching is disappointing to say the least.

Considering that we no longer have a need for the Azure integration features, we don't depend on using Azure DevOps anymore. Changing all our pipelines to a competitor however is also quite a task so I wish this would see higher priority.

damienwebdev commented 5 years ago

@bryanmacfarlane @chrispat would this be solvable by https://github.com/Microsoft/azure-pipelines-yaml/pull/113 ?

BenWalters commented 5 years ago

@chrispat @bryanmacfarlane, do you have any news on this? It's been being discussed for a long time now. Even a timeline would be nice.

jarednlivingston commented 5 years ago

@chrispat @bryanmacfarlane, do you have any news on this? It's been being discussed for a long time now. Even a timeline would be nice.

It looks like it might be coming in June 2019: https://github.com/microsoft/azure-pipelines-yaml/pull/113#issuecomment-493113099

oak-tree commented 5 years ago

Hey! Docker cache is a must have feature for devops. We are lossing so much time for rebuilding our image for every single relase.. :( according to the last comment at https://github.com/microsoft/azure-pipelines-yaml/pull/113#issuecomment-505890542, the https://github.com/microsoft/azure-pipelines-yaml/pull/113 does not going to fix this issue. Any ideas when devops plans to release this feature?

jfranki commented 5 years ago

We have a Dockerfile with a base image of more than 2GB (MATLAB Runtime) stored in an Azure Container Registry. So yeah... that one has to be built locally on every update until this is sorted out.

BenWalters commented 5 years ago

This feature request has been open 18 months now with the last update over 1 year ago!

@bryanmacfarlane & @chrispat could we have an update please as I don't believe that microsoft/azure-pipelines-yaml#113 doesn't look like it'll solve it.

bryanmacfarlane commented 5 years ago

You are correct that the caching feature won't help you here. It's also not a task issue.

Hosted pools re-image and give you a fresh machine every time. Private / custom agents on your own VM is the only way to get incremental build, images, etc.

There is a feature which is the best of both worlds.

https://github.com/microsoft/azure-pipelines-agent/blob/master/docs/design/byos.md That would give you incremental cached containers with the convenience of hosted pools (spun up and down for you).

You should probably create a feature request here as that's a large feature / service addition (not an issue with a task). Get some votes.

https://developercommunity.visualstudio.com/spaces/21/visual-studio-team-services.html?type=idea

@thejoebourneidentity FYI as his team is driving that feature (they own hosted pools).

I'm inclined to close this issue here because it's not a task issue and it just won't get traction here.

BenWalters commented 5 years ago

Hi @bryanmacfarlane Thanks for the update, although a little surprised that this answer hasn't been given sooner! Guess we are back of the queue again with this request 👎

bryanmacfarlane commented 5 years ago

Time hasn't been wasted on the queue because of this issue. It has been actively proposed, already designed per the link above and the feature has been postponed by product.

Closing as this isn't a task issue.

I've forwarded this link to the team ( @thejoebourneidentity ) with the feature and also sent them a mail.

gldraphael commented 5 years ago

Relevant link: https://developercommunity.visualstudio.com/idea/365799/improve-hosted-build-agent-performance-with-build.html

https://developercommunity.visualstudio.com/comments/515033/view.html

thejoebourneidentity commented 5 years ago

Thanks @bryanmacfarlane. Hey folks, Bryan is right. True E2E docker layer caching won't be achieved in our hosted pools since we refresh machines between jobs. The postponed 'BYOS' feature as it is called on the agent repo is absolutely on our radar and will give you maximum caching flexibility.

In the meantime, if there are base container images you would like to see us include in our hosted pool images, please feel free to request them over on our image gen repo: https://github.com/microsoft/azure-pipelines-image-generation

robertoandrade commented 5 years ago

I was able to get this working using the Cache task and some conditional manual docker save/load commands after it to dump the desired docker image (and its layers) into a tar ball that the Cache task would then upload and download on the next execution which the load command from docker would put back into the local registry. The time it takes the Cache task to perform its thing in combination with the docker save/load respective operations pretty much matched that of downloading the base image/layers from a public registry, so not seeing much benefit in doing it.

In an earlier attempt spent a significant amount of time trying to cache the contents of C:\ProgramData\Docker (where the deconstructed images are stored) and even change the docker daemon to point to another location, that was unsuccessful given the container set permissions on the files/folders there.

When running benchmark tests locally in a Windows VM on my Mac, it seems to download and extract files much faster than the Azure Pipeline agents, so thinking the key bottleneck is I/O and perhaps networking.

RobertoPrevato commented 4 years ago

On July 2019 I published files to build Docker images for private Azure DevOps agents. One of them has Docker, so it supports Docker workloads. If you can run a container on a machine, this is a possible solution to benefit of caching, and speed up builds.

https://github.com/RobertoPrevato/AzureDevOps-agents

See: https://github.com/RobertoPrevato/AzureDevOps-agents/tree/master/ubuntu18.04-docker

letmaik commented 4 years ago

@bryanmacfarlane @thejoebourneidentity I think the ask here is not to re-use VMs but still have a fresh VM each time (that's my use case anyway) with the option of defining that Docker layers should be cached across runs, similarly to how regular folders can be cached. For that, BYOS wouldn't bring any benefit. What's the technical limitation to get this going? Is it simply that Docker doesn't easily allow to restore a local cache? Has anyone deeply looked into this already?

letmaik commented 4 years ago

Follow-up: Even though not as simple as a "checkbox", I was successful in saving/restoring a Docker build cache by using Docker's --cache-from feature. Note that this requires pushing the built images (along with the cache) to a registry, which may not always be available. In another issue, a prototype was done on saving the cache to a folder using the Azure Pipelines cache functionality, however, that approach seems much more involved and also requires to install BuildKit separately.

adrian-skybaker commented 4 years ago

The postponed 'BYOS' feature as it is called on the agent repo is absolutely on our radar and will give you maximum caching flexibility.

Relying on this feature seems overkill for a simple, transparent docker cache, which is offered by other CI/CD platforms. The pipeline caching feature seems a better place for this. Caching source docker images is conceptually not very different to caching maven dependencies.

keesschollaart81 commented 4 years ago

Pulling the :latest and then using --cache-from point to it, like the example, below seems to work, or am I missing something? It's true that during build the agent has to download the image, the argument for doing this would be that the clients downloading this image later only have to download the updated layers, instead of every layer for every release, right?

- task: Docker@2
  inputs:
    containerRegistry: '$(ContainerRegistryName)'
    command: 'login'

- script: "docker pull $(ACR_ADDRESS)/$(REPOSITORY):latest"
  displayName: Pull latest for layer caching
  continueOnError: true # for first build, no cache

- task: Docker@2
  displayName: build
  inputs:
    containerRegistry: '$(ContainerRegistryName)'
    repository: '$(REPOSITORY)'
    command: 'build'
    Dockerfile: './dockerfile '
    buildContext: '$(BUILDCONTEXT)'
    arguments: '--cache-from=$(ACR_ADDRESS)/$(REPOSITORY):latest' 
    tags: |
      $(Build.BuildNumber)
      latest

- task: Docker@2
  displayName: "push"
  inputs:
    command: push
    containerRegistry: "$(ContainerRegistryName)"
    repository: $(REPOSITORY) 
    tags: |
      $(Build.BuildNumber)
      latest