microsoft / navcontainerhelper

Official Microsoft repository for BcContainerHelper, a PowerShell module, which makes it easier to work with Business Central Containers on Docker.
MIT License
381 stars 243 forks source link

Docker Host performance issues after starting to use PowerShell7? #3482

Closed jwikman closed 5 months ago

jwikman commented 5 months ago

I'm not sure that this is related to BcContainerHelper, but I put it out here if someone else experience similar issues and we can figure out the cause together.

Describe the issue

Lately we've started to experience performance issues with our pipeline build server. Pipelines are getting slower and slower, to the point that my colleagues start complaining and we restart the server. After restart, the server is fast as ever, but then after a few days to a week it starts to get slow again. It got even worse the last week, which make me think that this is related to the BC24 containers, since they are running much more frequently after the wave.

We started to see this behavior a couple of months ago, and I believe that it started after began to run pipelines on BC24. In conjunction with that, we also installed .NET 8 and started to use PowerShell 7 in the pipelines. So, they might also be part of the cause.

Symptoms

The symptoms we see is that all PowerShell tasks in the pipelines "idles" some time before generating the PowerShell script. In below example there is almost 4 minutes from that the task started until the script is started. After restart there is no delay there, and after a week there could be up to 10 minutes delay. And when having 20 different PowerShell tasks in the pipeline, this adds up to a lot of waiting...

2024-04-10T11:44:36.9557848Z ##[section]Starting: Create Build Container
2024-04-10T11:44:37.1721251Z ==============================================================================
2024-04-10T11:44:37.1721401Z Task         : PowerShell
2024-04-10T11:44:37.1721469Z Description  : Run a PowerShell script on Linux, macOS, or Windows
2024-04-10T11:44:37.1721588Z Version      : 2.237.5
2024-04-10T11:44:37.1721654Z Author       : Microsoft Corporation
2024-04-10T11:44:37.1721733Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/powershell
2024-04-10T11:44:37.1721862Z ==============================================================================
2024-04-10T11:48:12.3345494Z Generating script.
2024-04-10T11:48:12.3680887Z ========================== Starting Command Output ===========================
2024-04-10T11:48:12.3930122Z ##[command]"C:\Program Files\PowerShell\7\pwsh.exe" -NoLogo -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command ". 'C:\Agent_AS015_03\_work\_temp\278905e0-b287-45ae-aba4-2ab7780ddf78.ps1'"
2024-04-10T11:48:13.1932853Z BcContainerHelper version 6.0.11

When this happened today, I noticed that the WmiPrvSE.exe process was heavy on the CPU, and like 50+ WMIC.exe processes running (started by the Build Agent). Don't know if it is related to the behavior we've seen lately, but might be relevant. image

I see that BcContainerHelper uses Get-CIMInstance to get free physical memory, could that cause the WMIC processes above?

Additional context

Anyone else experience similar performance degradation lately? Any suggestions on how to troubleshoot?

freddydk commented 5 months ago

Are containers running Process or HyperV isolation? Are you running all BC related cmdlets inside containers or on the agent?

Maybe related: https://www.yammer.com/dynamicsnavdev/threads/2746068688338944

freddydk commented 5 months ago

I don't think Get-CIMInstance causes these. If you run

wmic OS get freephysicalmemory,totalvisiblememorysize /value

then that is a commandline which returns two values. I use Get-CIMInstance to get a CIM instance and extract values from that (OS Version number, memory and more)

jwikman commented 5 months ago

Are containers running Process or HyperV isolation?

Using process isolation

Are you running all BC related cmdlets inside containers or on the agent?

After the issues with the PS5 to PS7 bridge (https://www.yammer.com/dynamicsnavdev/threads/2687029664907264), I refactored our pipelines to not load the BC cmdlets on the host anymore. So, all BC related cmdlets should be run inside the containers (via BcContainerHelper)

Maybe related: https://www.yammer.com/dynamicsnavdev/threads/2746068688338944

Not sure about this, when our pipelines start to slowdown (or starts to idle in all PowerShell tasks), there are still 50+ GB RAM free on the host.

jwikman commented 5 months ago

I don't think Get-CIMInstance causes these. If you run

wmic OS get freephysicalmemory,totalvisiblememorysize /value

then that is a commandline which returns two values. I use Get-CIMInstance to get a CIM instance and extract values from that (OS Version number, memory and more)

Yes, tried that on my local computer as well and did not see any WMIC processes being left running. But wanted to add the information anyway, if it might be related in some strange way....

freddydk commented 5 months ago

In order to isolate processes 100%, could you try hyperv? Using process isolation, you will still see processes from the container on the host list of processes.

jwikman commented 5 months ago

Good thinking, I'll try that!

I think most pipeline containers runs with 8GB RAM, so it should hopefully work. Let's see šŸ˜ƒ

jwikman commented 5 months ago

Update: Installed Hyper-V and changed all pipelines to use that instead of process isolation last night. Performance in containers are pretty much the same as before.

One change of behavior that I see: WMIC processes are still created and are running 50+ i parallell. But now they are all terminated now and then... When all those are running, I see a CPU spike in WmiPrvSE.exe, but that might be caused by the fact that WMIC is querying for available memory in a lot of processes at the same time...

I'm starting to believe that all this is related to the PowerShell v2 task, combined with pwsh: true that we started with when switching to PowerShell7. I also think that we updated our Azure Pipeline Agents at the same time. That version (v3.236.1) is still the latest version.

So right now we just need to wait and see if our old behavior with the host getting slow after a few days still exist. If all fine in a week or two, we're just happy. If not, we will start to test other versions of the Azure Pipeline Agents to see if behavior changes.

I'll close this issue for now, will get back here if things starts to point against BCCH šŸ˜‰

jwikman commented 5 months ago

FYI

I found out what is causing the WMIC processes: https://github.com/microsoft/azure-pipelines-agent/blob/b34a9c376bf689d70092981eaa40d3e19327b11a/src/Agent.Worker/ResourceMetricsManager.cs#L255

I'll try to ask in that repo why it is starting that many processes, if it's a bug or by design.

freddydk commented 5 months ago

Thanks for the info:-)