microsoft / AL-Go

The plug-and-play DevOps solution for Business Central app development on GitHub
MIT License
270 stars 113 forks source link

Self-hosted runner is running out of memory #781

Open nemanja899 opened 10 months ago

nemanja899 commented 10 months ago

As per title , when configuring self-hosted runner in Al-Go settings, my virtual machine in Azure is running out of memory.

Anyway this is snipet of my Al-Go settings: image It almost always downloads another image for container even tho, I specified where to look in Al-Go settings. My VM ends up with 10+ images with same repository name and with <none> as a name. Please Help. Thank you

jonaswre commented 10 months ago

Hi, what VM size have you configured?

And do you mean memory or storage?

If you meant storage then my guess is 60 Days is too long. I think 3 Days is default. Caching every minor or new revision for 60 days is quite storage intensive.

nemanja899 commented 10 months ago

Yes that is correct it's storage and it's around 150GB. Hmm but why it's catching minor and revision or did you mean build and revision? I thought it's only catching Major.Minor and not Buld.Revision , that could maybe be an option in Al-Go settings.

Just to add storage is full after only one or two days and had to continuously go to VM and delete pulled images.

jonaswre commented 10 months ago

You will basically cache every image that's was updated since your last build.

Your build will try to aquire the latest image if it's not found on disk it will download it. The cleanup will then delete all images that are old (not sure - either by creation or usage date - I would need to look at the code, but I'm currently on the phone)

Sometimes new images are released multiple times a day.

We just left cacheKeepDays at the default value. 60 Days is definitely too long.

nemanja899 commented 10 months ago

Then what's the point running as self-hosed if image is downloaded multiple times a day. On Azure DevOps pipelines didn't had that problem, in one year images were downloaded just a couple of times.

freddydk commented 10 months ago

Having 10 images built in one day seems wrong (unless you are targeting many countries) As always, including the build log will help us see what's happening and why this is.

AL-Go caches images using BcContainerHelper, which probably is exactly what you did in Azure DevOps, NextMinor and NextMajor are typically NOT cached, since these workflows are not run on a daily basis and the artifacts for these are almost certain to change. Which artifacts are you building against? (what is the value of the artifact setting) By default that is latest, but can be set to weekly or a specific version if you like (https://aka.ms/algosettings#artifact)

jonaswre commented 10 months ago

I didn't mean they are downloaded multiple times a day each day. I meant sometimes they are released very frequently. Especially after major releases.

This is just something I realized from my own experience. Maybe the times I noticed it, were just outlier's.

If I were you I would start at the default value and increase until you find a good balance between redownloading and storage.

Do you have the 150 GB in one or two partions?

And one more thing... There was a bug regarding cleanup in an old version make sure your AL-Go is up-to-date.

nemanja899 commented 10 months ago

This happens all the time Screenshot 2023-10-18 133157 Had to delete them all in VM And also every time there is a different tag new image is downloaded. Notice only one image version without none compare it to the new one in image below. image Almost every time there is new tag, new image is pulled.

Just to add only w1 country is used.

I meant that BcContainerHelper caches images based only on Major.Minor tag version of image, to ignore Build.Revision Artifact is set to default. Al-Go is up to date.

jonaswre commented 10 months ago

@freddy correct me if I'm wrong but the "none" images are only layers. They don't take up additional space right?

Can you send a screenshot of how many space is actually being used?

How many runners are on your VM and do you have any particular large repos?

In the earlier days I encountered lots of people storing .app files in their repos.

Have you ever done that?

And it's correct to download the image if there is a new tag.

nemanja899 commented 10 months ago

I deleted Images now only have one that u see in latest picture. "none" takes storage it fill up memory storage in a day.

And yesterday had multiple images with "currentversion" repository name, had to go to VM to delete them also.

It's not that large repo and .app file is not stored in repo, but as an artifact when doing al-go Build action.

jonaswre commented 10 months ago

I think you need to explain further in which situations these images are created.

@freddydk could it be that building a new image (my) causes these layers to "dangle"? Maybe setting it manually causes an issue? We only use the default values

nemanja899 commented 10 months ago

Images are created with CI/CD and Pull Requests flows.

jonaswre commented 10 months ago

And you use which artifact in the Pipelines?

Can you provide the following values after your cleanup and then after one and two runs?

Total space? Total free space? Space taken by the artifacts?
Space taken by the docker images? Space taken by your work directories?

And after that maybe try to remove cacheImageName and cacheKeepDays and record the same values.

nemanja899 commented 10 months ago

Total space : 127 GB ignore Temp storage Total free space after removing images : 65.5 GB There are two left, as shown in the latest image. Now after running CI/CD image

Again new image is pulled

image

As you can see around 13GB is taken.

Artifacts size. image

I think problem is with tag versions.

freddydk commented 10 months ago

I would add some more disk space and set keepdays to 3

Or use the artifact setting to use the same bc artifacts for longer time (daily, weekly or until you change it)

jonaswre commented 10 months ago

But if you rerun twice right after another size on disk shouldn't change.

I guess Microsoft is fixing a lot currently.

It's by design. A new tagged version should be downloaded. If you don't want that, you can set the artifact to a specific version.

But if you rerun twice right after another size on disk shouldn't change.

@freddydk if Microsoft publishes so many new images maybe AL-Go could get a max disk usage percentage setting. Or some sort of logic to only keep one minor on disk. This would at least not overflow smaller runners.

But that's also something you could script on such a runner.

freddydk commented 10 months ago

Yeah, maybe we could do something in this area, to better work with smaller runners - but in the end, adding more disk space and setting KeepDays to 3 will cache all images that have been used for the last 3 days.

jonaswre commented 10 months ago

I would say in the long term it might be good to add some sort of "max disk usage" or "only keep one x versions of one minor". This would isolate runners from the frequency Microsoft publishes new versions. In theory Microsoft could currently overwhelm any runner if they just increase the frequency enough. We currently have around 200 GB storage for our runner that means 20 new releases in 3 Days. Also unlikely still possible. And it should be fairly simple to append something to the cleanup.

freddydk commented 10 months ago

Easier would be then to change the default artifact to daily instead of latest - this ensures that you will only grab one set of artifacts daily.

jonaswre commented 10 months ago

Easier would be then to change the default artifact to daily instead of latest - this ensures that you will only grab one set of artifacts daily.

Is this already possible?

This would still only solve it if you accept getting one artifact a day. People might still want to get the latest but only cache one version.

But maybe people who need the latest but run small runners need to write there own code to clean it up. Because you would want that cleanup in general on a small runner no matter your configuration in the repos. Would be cool if the cleanup job looks for a specific file on the server an executes that. This would allow you to define cleanup scripts per runner/vm not per repo.

freddydk commented 10 months ago

By setting the artifact to "////daily" or "////weekly" you will minimize the number of new artifacts downloaded. I would say that in most cases, a daily build is fine - and probably also the weekly.