microsoft / Windows-Containers

Welcome to our Windows Containers GitHub community! Ask questions, report bugs, and suggest features -- let's work together.
MIT License
423 stars 64 forks source link

Process Isolation is very slow as compared to HyperV Containers on Server 2019 #459

Open saraf-akshay opened 9 months ago

saraf-akshay commented 9 months ago

Describe the bug Slowness in cloning source when running multiple containers simultaneously in process isolation.

Isolation Mode Time in Git clone Containers running in parallel Comments
Process 9 mins 1
HyperV 8.5 mins 1
Process 21 mins 10 <-- This is the problem
HyperV 11 mins 10

As the number of containers increases on the server, the performance of container slows down significantly but only in process isolation. I am not worried about minor performance differences. The same also happens when I compile in these containers using nmake. The performance degrades in process isolation.

These 10 containers I mentioned above are triggered by a Jenkins pipeline using Kubernetes. Here is the yaml code I used:

apiVersion: v1
kind: Pod
spec:
  tolerations:
  - effect: NoSchedule
    key: custom/build-hosts
    operator: Exists
  containers:
  - name: jnlp
    image: <image link redacted>
    command:
    - powershell
    args:
    - cp -R C:\\privconf\\*  C:\\Users\\ContainerAdministrator;
    - C:\\jenkinsscript\\jenkins.ps1
    resources:
      limits:
        cpu: 12
        memory: 16Gi
      requests:
        cpu: 12
        memory: 16Gi
    env:
    - name: MY_POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: MY_HOST_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
    volumeMounts:
    - mountPath: /privconf
      name: credential-volume
    - mountPath: /gitcache
      name: cache-volume
    - mountPath: /jenkinsscript
      name: jenkins-script
  volumes:
  - hostPath:
      path: D:/agentconf
      type: ""
    name: credential-volume
  - hostPath:
      path: D:/agentcache
      type: ""
    name: cache-volume
  - configMap:
      defaultMode: 420
      name: jenkins-script
    name: jenkins-script
  nodeSelector:
    custom/fcds: test_akshay

The HyperV Data was gathered using Docker Swarm, as K8S doesn't support HyperV Isolation.

dockerSwarm {
    label "docker-agent"
    image "<image link redacted>"
    limitsNanoCPUs 12000000000
    limitsMemoryBytes 17179860384
    reservationsNanoCPUs 12000000000
    reservationsMemoryBytes 17179860384
}

The physical host that I ran it on is a bare metal server, with 208 logical cores (104 physical cores) after Hyperthreading enabled.

To Reproduce Please trigger 10 parallel containers on the same host at the exact same time, cloning the exact same repository, and that way you should be able to reproduce the issue.

Expected behavior The expectation is for Process Isolation to work on par or better than HyperV Isolation.

Configuration:

Server: Docker Engine - Community Engine: Version: 25.0.0 API version: 1.44 (minimum version 1.24) Go version: go1.21.6 Git commit: 615dfdf Built: Thu Jan 18 17:09:34 2024 OS/Arch: windows/amd64 Experimental: false



**Additional context**

I have verified that there is no resource over provisioning and my Windows defender is disabled, and all my processes (including git and git-lfs) and directories where source code is checked out are part of exclusion list. As mentioned here: https://github.com/microsoft/Windows-Containers/issues/149 
Also verified I have the Defender fix, which was released here: https://github.com/microsoft/Windows-Containers/issues/345
fady-azmy-msft commented 9 months ago

Hey @saraf-akshay, could you share what you're seeing with Windows Server 2022 process isolation?

We don't ship OS level fixes anymore for Windows Server 2019 because it is now out of mainstream support (only address security fixes): https://learn.microsoft.com/en-us/lifecycle/products/windows-server-2019

saraf-akshay commented 9 months ago

@fady-azmy-msft : Thanks for your response. I'm working on preparing a server with Server 2022. It might take a couple days. I'll keep you posted.

saraf-akshay commented 9 months ago

@fady-azmy-msft ,@ntrappe-msft : There is still slowness.

Server 2022 is a lot better than Server 2019. Server 2019 was 2x slower, whereas Server 2022 is 1.25x slower in Process Isolation as compared to HyperV Isolation when I run 10 containers in parallel on a host, (essentially trying to run host at its full capacity) with resource (CPU and Memory) restriction as showed in my first comment's yaml file.

nickcva commented 9 months ago

Here is what I have experienced with process isolation compared to Hyper-V isolation. I have seen cascading container failures and even containers that crash and cannot recover EVER they have to be redeployed. The performance is night and day better on my SHIR containers now with Hyper-V isolation.

Host Running 2019 DC Container 2019 core latest

https://github.com/Azure/Azure-Data-Factory-Integration-Runtime-in-Windows-Container/issues/7

saraf-akshay commented 7 months ago

Hello @Howard-Haiyang-Hao @fady-azmy-msft @ntrappe-msft
Just checking in, Any update on this?

microsoft-github-policy-service[bot] commented 6 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 5 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 5 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

nickcva commented 5 months ago

Im now running 80+ SHIR containers with hyper-v isolation successfully with little to no issues. Without hyper isolation the max that I could run was about 25+- and that also created issues that cause the container to completely corrupt its self at random. Please make a Linux compatible SHIR application for ADF / Synapse!

microsoft-github-policy-service[bot] commented 4 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

doctorpangloss commented 4 months ago

can you run this without using host paths?

microsoft-github-policy-service[bot] commented 3 months ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 month ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 3 weeks ago

This issue has been open for 30 days with no updates. @Howard-Haiyang-Hao, please provide an update or close this issue.

sbingham-MET commented 3 weeks ago

This is still very much an issue for AKS users, and was raised to MSFT support (#2309150040010155) back in October of 2023. With a similar finding here: https://forums.docker.com/t/docker-slower-to-copy-files-and-run-compiler-in-server-2019-than-windows-10/113938/2

Since AKS does not support hyper-v, only process isolation. It's on their roadmap, but no date: https://github.com/Azure/AKS/issues/1792

MSFT support eventually told us to try Linux containers since there was no resolution in sight. Unfortunate when you have to support some applications that are windows dependent. This was despite 4 months of back and forth with enterprise support.