microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.72k stars 866 forks source link

Working directory needs more efficient re-use or cleanup #1506

Open bergmeister opened 6 years ago

bergmeister commented 6 years ago

Agent Version and Platform

VSTS Type and Version

VSTS but agent is on-premise

What's not working?

We created new agent machines in Azure. For fast builds, the working directory is on the D drive, which is only around 30GB big. We have a big monolithic repo with multiple build definitions and it seems that the agent keeps multiple checkouts on disk when running different build definitions and branches although they all point to the same repository. Git was designed to have fast branch switching strategies and therefore I do not see this as necessary. This leads to the agent running out of disk space after a few builds in a few hours. It is not a reasonable solution to add cleanup steps to our builds because of that as proposed in #708 Therefore either the agent needs to have a setting to clean up the working directory afterwards or be more efficient in re-using the same repository for different branches and build definitions.

xantari commented 4 years ago

This github issue seems to center primarily around builds. Is there a similar cleanup backlog item for pipeline releases that take the build artifact and deploy to the server?

Right now we have to add this at the end to clean those up: image

ChristianStadelmann commented 3 years ago

An example for missing disk cleanup is described in #3168.

github-actions[bot] commented 3 years ago

This issue has had no activity in 180 days. Please comment if it is not actually stale

bergmeister commented 3 years ago

Not stale but ignored as the top most upvoted issue for years!

Gilesey commented 3 years ago

issue is still as relevant then is it is now. Applies to both Git and VSTS users. Often you will find central IT controls hundreds of VMs, and provides the minimum capacity to each. If I want more GB, i have to go through a process to order it, escalate the order, and get signoff. So yeah, when a build agent is configured to run only one job at a time, (e.g. a CI build and maybe a full clean/build overnight, or branch A one request, branch B another) it makes sense that it can use the same folder, whether that's a scorched earth clean/get or a clever get, it is absolutely desirable to make best use of space. At least give us the option.

echalone commented 3 years ago

Hi, I think I've already programmed the solution for this, including UnitTests, as we need it too, and I've also made two pull requests for two different functions.

This pull request here would (on self hosted agents) allow repositories not just to be put in the build directory, but also in the work directory, therefore allowing self hosted agents to reuse repositories between build pipelines: https://github.com/microsoft/azure-pipelines-agent/pull/3475 This option would of course be needed to be specifically enabled on self hosted agents in the .agent settings file, so not to pose a security risk on public agents. There is also a UnitTest testing if the agent continues to throw an error if somebody tries to do this on an agent for which it wasn't specifically enabled. Repositories above the work directory level or with an absolute path continue to be not allowed, even if this new "AllowWorkDirectoryRepositories" option is set (those UnitTests are also included).

And this second pull request would allow to set the default working directory (not to be confused with the agent work directory of the previous pull request) to the checkout path of the desired repository in a multi-checkout scenario during the checkout step: https://github.com/microsoft/azure-pipelines-agent/pull/3479 This would allow to call scripts and use files in the desired repository of a multi-checkout scenario without the need to use relative paths or build variables to point to the correct repository. Just define during the Yaml checkout steps which repository should basically be the working directory for all build steps and you're done. UnitTests are of course again included.

I also got a third pull request which would bugfix the primary/self repository detection (in some specific scenarious there's some undesired behaviour) which would be good to be fixed for the work directory function, as well as bugfixing some UnitTest localization problems (5 UnitTests aren't working correctly on some non-english systems): https://github.com/microsoft/azure-pipelines-agent/pull/3473

mjthurlkill commented 3 years ago

This looks like a bug in Azure DevOps. For pipelines that checkout a small repo, it appears to almost always reuse the workfolder and only syncs the changes since the last run of one of the pipelines. For pipelines that checkout a large repo, it appears to almost always re-clone the repo into a new workfolder instead of just syncing changes since the last run. By small repos I mean smaller than around 15gb. We have repos that are that size or smaller that seem to reliably reuse the workfolder. By large repos I mean larger than around 15gb. We have repos that are larger than that that generally don't reuse the workfolder. I haven't studied the permutations deeply, but it appears that pipelines using large repos that sync to the same commit will re-use the workfolder more reliably. But push a new change in the same branch, run for a different branch, etc. then the checkout step for a large repo more often than not clones to a new workfolder.

The docs say that the workfolder should be reused, but it isn't working in cases like larger repos.

I am thinking of trying to use "checkout: none" and then write my own step to do the checkout. Maybe git fetch git checkout $(Build.SourceBranch) git merge git checkout $(Build.SourceVersion) But I'm not sure how ado will assign a workfolder and if it will give me a new one anyway, and I DO want a different workfolder if one pipeline is using one repo and another pipeline is using a different repo.

This sounds like a bug that needs to be fixed in ado and I may not have enough control to do a workaround like this.

anatolybolshakov commented 2 years ago

Hi everyone! As a quick update - we are planning to review PRs above soon, thanks for contribution! @mjthurlkill agent re-uses already existing repo for the same build definition, devops collection and repo - could you please share logs for the pipeline there checkout does not re-use existing large repository (in debug mode)? Please also make sure that you masked sensitive data if there is some.

PaulVrugt commented 2 years ago

We would love to have this feature too. Please don't forget support for scaleset agents. It might be tricky to set agent settings in this scenario since the agent is automatically installed by azure devops when provisioning instances

github-actions[bot] commented 1 year ago

This issue has had no activity in 180 days. Please comment if it is not actually stale

PaulVrugt commented 1 year ago

Not stale, just lack of response from Microsoft

echalone commented 1 year ago

Not stale, just lack of response from Microsoft

we need a bot for this ^^

github-actions[bot] commented 1 year ago

This issue has had no activity in 180 days. Please comment if it is not actually stale

ChristianStadelmann commented 1 year ago

Not stale, just lack of response from Microsoft

Same.

we need a bot for this ^^

Definitely.

EugenMayer commented 1 year ago

I implemented https://github.com/EugenMayer/azure-agent-self-hosted-toolkit which fixes all of that issues. Cleanup and pollutions (for the next job). Check the project README

balchen commented 1 year ago

I read the README. Couldn't see how these tools solve the issue at hand, which is to build several pipelines from the same git repo and avoiding multiple checkouts of the repo on disk.

EugenMayer commented 1 year ago

It fixes the cleanup part since --once will ensure that after the job, the client is disconnected (ensures no job is started), cleans up the workdir and reconnects. This takes about 5seconds to make the agent available again.

What this does not fix ('all was therefor wronge') is making the checkout more efficient. This is out of scope and to be honest, re-using the workdir on an agent an with an smell 'that there is probably a job that has been run before' does only work in env. where an agent does only one specific job for one pipeline. IMHO a super-specific (i would say uncommon;v case

balchen commented 1 year ago

OK, so not "all of the issues", but the issue of cleaning up the workdir after each build. Which is a solution, but definitely a second choice when the repo is 30 GB (as stated in the original issue) and you need to check out 30 GB for every single build, even on the same pipeline.

In regards to your second paragraph, it seems a number of people want this, so it can't be that super-specific and uncommon.

EugenMayer commented 1 year ago

In regards to your second paragraph, it seems a number of people want this, so it can't be that super-specific and uncommon.

To be honest, there is IMHO not a single CI/CD solution that can do what you ask here. Travis, CircleCI, Gitlab, Bitbucket Cloud, Bamboo, GoCD, Concourse, Buildkite to name a few. - i'am not even sure this can be custom-crafted in jenkins without actually doing the exact same thing you would du in az pipelines anyway.

So if that is a common issue, this is a huge missing of all those toolkits. In fact, offering an agent runner that actually is generic at one point (ephemeral) but at the same time suddenly 'knows that there is a folder locally that can be used' is just nothing that can be introduced in a sane manner.

There are caches for that purpose - but of course they will fetch 30GB anyhow.

So to be honest, no offense, waiting for MS to implement this feature is something that i assume is a bad bet.

What you most probably want to do is using a step that downloads your repos from a some static server that runs on the agent host locally. Of course, this does not make sense if you need a history. But if you really have 30GB assets in a github repo and in need for the history ... well the issue goes deeper i guess.

Do not get me wrong, if that feature happens, happy for you all. But if you place bets on this, i would assume the odds are very bad - esp. considering that the current CI/CD space does not care for something like this AFAICs (yet).

balchen commented 1 year ago

I see. Thank you for telling us that. Since you obviously have no interest in this feature, how about just staying away from it?

PaulVrugt commented 1 year ago

Well at least the above discussion made sure the stale label was removed

echalone commented 1 year ago

Hi, I think I've already programmed the solution for this, including UnitTests, as we need it too, and I've also made two pull requests for two different functions.

This pull request here would (on self hosted agents) allow repositories not just to be put in the build directory, but also in the work directory, therefore allowing self hosted agents to reuse repositories between build pipelines: #3475 This option would of course be needed to be specifically enabled on self hosted agents in the .agent settings file, so not to pose a security risk on public agents. There is also a UnitTest testing if the agent continues to throw an error if somebody tries to do this on an agent for which it wasn't specifically enabled. Repositories above the work directory level or with an absolute path continue to be not allowed, even if this new "AllowWorkDirectoryRepositories" option is set (those UnitTests are also included).

And this second pull request would allow to set the default working directory (not to be confused with the agent work directory of the previous pull request) to the checkout path of the desired repository in a multi-checkout scenario during the checkout step: #3479 This would allow to call scripts and use files in the desired repository of a multi-checkout scenario without the need to use relative paths or build variables to point to the correct repository. Just define during the Yaml checkout steps which repository should basically be the working directory for all build steps and you're done. UnitTests are of course again included.

I also got a third pull request which would bugfix the primary/self repository detection (in some specific scenarious there's some undesired behaviour) which would be good to be fixed for the work directory function, as well as bugfixing some UnitTest localization problems (5 UnitTests aren't working correctly on some non-english systems): #3473

Also, I can only reiterate that I think I've actually already programmed the solution for this and the pull request is active. But sadly it's taking Microsoft a really long time to review Pull Requests for this software :/ those Pull Requests are now 2 years old... I'm still keeping them up to date and hope one day the feature(s) and fixes in my Pull Requests will be included in the agent.

6heads commented 9 months ago

Hi everyone! As a quick update - we are planning to review PRs above soon, thanks for contribution! @mjthurlkill agent re-uses already existing repo for the same build definition, devops collection and repo - could you please share logs for the pipeline there checkout does not re-use existing large repository (in debug mode)? Please also make sure that you masked sensitive data if there is some.

Your reply is now almost two years old. Was the PR that extensive?

echalone commented 9 months ago

Your reply is now almost two years old. Was the PR that extensive?

My friend he's not even working for Microsoft any more 😆 I got a few PRs waiting for them to review for about 2-3 years now, so far they managed to include one ^^ thankfully it was the most important one.

jrnewton commented 7 months ago

@kirill-ivlev any update on this issue and the related PRs?

ADD-ACS commented 6 months ago

You can use #4423.

jrnewton commented 6 months ago

@ADD-ACS - not sure how that issue is related. My take - this issue is about cleanup of the work directory while #4423 is about changing the location of the work directory.

balchen commented 6 months ago

@ADD-ACS - not sure how that issue is related. My take - this issue is about cleanup of the work directory while #4423 is about changing the location of the work directory.

The motivation for this issue is avoiding multiple checkouts of the same, very large repository -- typically one per pipeline. Either re-use of the repo or a different clean-up mechanism were suggested as ways to solve it.

The standard checkout is to _work//. #4423 allows us to to change this to a static location (e.g. just _work/), effectively forcing re-use of the repo between pipelines. That will provide a solution to the original issue in many circumstances.

mjthurlkill commented 3 months ago

You can tell how closely I have been following this issue...:-( I'll try to find time to get some logs. However, I'm using a different solution now. I have a template currently that uses partial clone/fetch (git fetch --filter=blob:none) and sparse-checkout. Should turn this into an extension, but would be nicer if it were part of the checkout command. Partial fetch gives about the same benefit as shallow but isn't as problematic. Partial fetch + sparse-checkout provides tremendous benefits. It still has a source directory per pipeline, but it becomes less of a problem. The main drawback is you need to specify the directories to be included in the sparse-checkout. If you are setting a CI trigger on the appropriate directories of the pipeline dependencies, you basically need to specify those directories there as well. (would be nice to have a system variable that contained the directories specified for the trigger, or be able to set those directories from a variable (though to trigger they probably need to be static)) (There is the SelectiveCheckout extension in the marketplace, but it isn't quite there yet. It uses shallow instead of partial clone, which causes problems. In mine, I do different things if the sources dir is new/empty, shallow, partial, or not partial, sparse or not sparse, because different branches may have used different methods so needs to account for that.)

The normal clone size for one of my repos is 30gb. For some of the pipelines, the partial/sparse size is 5-10gb. For some it is < 100mb. This really speeds of the checkout, e.g. 10 secs for the small ones or 3 minutes for the large ones vs 30 minutes for a full clone. Besides the speed, it really reduces the footprint on the agent server, eg sources directory consume < 1 to 10gb vs 30gb for each normally. Also, I'm not sure we would want to, but now, given the size, it would be possible to use the ADO hosted agents instead of self hosted agents for at least some of our pipelines. And the checkout time would be reasonable. I haven't done anything for multiple repos in this, but will have to look at that.

I'll review the thread about repo directory reuse described above. That still may be the ideal solution for my needs. The first clone into that directory will be slow, but after that it should be fast. However, what I have right now is pretty good.