moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.61k stars 18.64k forks source link

Windows Docker `docker info` shows only 64 CPUs for servers with 128 CPUs #35296

Open georgyturevich opened 6 years ago

georgyturevich commented 6 years ago

Hello all,

I tested it on two machines with 128 CPUs - AWS x1.32xlarge and Azure M128S. docker info shows CPUs: 64 in both cases.

I have a suspicion that it also can affect performance of containers if we set CPU limiting flags (--cpus/--cpu-percent) and in some other cases.

Output of docker version:

Client:
 Version:      17.10.0-ce
 API version:  1.33
 Go version:   go1.8.3
 Git commit:   f4ffd25
 Built:        Tue Oct 17 19:00:02 2017
 OS/Arch:      windows/amd64

Server:
 Version:      17.10.0-ce
 API version:  1.33 (minimum version 1.24)
 Go version:   go1.8.3
 Git commit:   f4ffd25
 Built:        Tue Oct 17 19:09:12 2017
 OS/Arch:      windows/amd64
 Experimental: false

Output of docker info for AWS x1.32xlarge:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 17.10.0-ce
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.1770.amd64fre.rs1_release.170917-1700)
Operating System: Windows Server 2016 Datacenter
OSType: windows
Architecture: x86_64
CPUs: 64
Total Memory: 1.906TiB
Name: EC2AMAZ-99PBCJ4
ID: VZIV:QABT:63VB:KUOW:4FOR:BTTB:QULZ:U7KS:CIZK:QWTU:63SI:65EG
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 22
 System Time: 2017-10-25T14:21:49.7138603Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Output of docker info for Azure M128S:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 17.10.0-ce
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 14393 (14393.1715.amd64fre.rs1_release_inmarket.170906-1810)
Operating System: Windows Server 2016 Datacenter
OSType: windows
Architecture: x86_64
CPUs: 64
Total Memory: 2TiB
Name: dw-big
ID: VZUZ:EK2S:UUL4:EUGI:OILR:IXOC:CCXJ:BLHZ:P4JS:2ON6:E7YS:4RLJ
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 22
 System Time: 2017-10-25T13:22:49.357094Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Thanks!

friism commented 6 years ago

This is a golang/Windows limitation:

cc @jstarks @patricklang

georgyturevich commented 6 years ago

@friism Hi Michael,

Thanks for sharing the finding!

I have a few questions there:

  1. Do we have some ways to prioritize a fixing of this issue? (e.g .NET sorts it out somehow)

  2. If I understand correctly this value is returned by sysinfo.NumCPU() function. So mostlikely, we have a bug with setting CPU limits by --cpus option which is calcuated there https://github.com/moby/moby/blob/3ba1dda1914fa7d380d9d3220c3b158a41f90cba/daemon/oci_windows.go#L268

According to the formula cpuMaximum = uint16(c.HostConfig.NanoCPUs / int64(sysinfo.NumCPU()) / (1e9 / 10000))

it will allocate two times more CPUs for 128 CPUs servers. Not sure how it can be solved. Maybe by introducing separate Dockerd option like limit_numcpu which will be used in the formula above.

It would interesting to see Darren's @darrenstahlmsft opinion as he already worked on some --cpus issue.

Thanks!

darstahl commented 6 years ago

Interesting. I don't think we want to introduce another parameter just to fix a bug unless we absolutely have to. I'll look into seeing if I can get NumCPU from another source that correctly reports on machines with over 64 CPUs though.

We will need to be careful though, as NumCPU returns the number of CPUs available to Go, which is needed for some calculations, and the other source would be for platform calls and returned API values only.

friism commented 6 years ago

@darrenstahlmsft yeah, I think you should probably try to get this addressed in golang:

darstahl commented 6 years ago

This is a much larger change to get support in Golang than just Moby. I think NumCPU is expected to return the number of CPUs available to Go, but go would need (likely major) runtime changes to support more than 64 CPUs for execution (and as a result, make it valid to return more than 64 CPUs here). The platform already supports more than 64 CPUs today (I'm pretty sure, at least), and I also see no reason that the Docker API should limit the returned CPUs based on Go's process limits.

Eventually this should be fixed in Go, but I think we can safely fix this in Moby without actually allowing dockerd.exe access to more than 64 CPUs (just containers started by dockerd.exe would have access to these CPUs).

That's my current understanding of the problem at least. I would need more time to look at it to confirm.

337m commented 10 months ago

I also have an issue with the container randomly starting with 8 cpus or 64 cpus.

As reported by C++ std::thread::hardware_concurrency()

See #46885