moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.42k stars 18.61k forks source link

Unwarranted error "Range of CPUs is from 0.01 to 8.00, as there are only 8 CPUs available" #40303

Open VynDragon opened 4 years ago

VynDragon commented 4 years ago

Hi, I manage a docker swarm with heterogeneous nodes, and today I got a new error that is preventing some services from starting instances on nodes where the CPU core count is lower.

Steps to reproduce the issue:

  1. Have node with 8 cores
  2. set --limit-cpu to 12 on service
  3. try to run service on node

Describe the results you received: getting "Range of CPUs is from 0.01 to 8.00, as there are only 8 CPUs available" and instance not starting

Describe the results you expected: Get no error and the instance start, because 8 < 12.

Additional information you deem important (e.g. issue happens only occasionally): When the "limit-cpu" restriction is removed, the instances start normally on that node

Output of docker version:

docker version server:

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89e8a
 Built:             Thu Jul 25 21:21:35 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89e8a
  Built:            Thu Jul 25 21:20:09 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

docker version node:

Client:
 Version:           19.03.5-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        633a0ea838
 Built:             Fri Nov 15 03:19:09 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.5-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       633a0ea838
  Built:            Fri Nov 15 03:17:51 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.3.2.m
  GitCommit:        d50db0a42053864a270f648048f9a8b4f24eced3.m
 runc:
  Version:          1.0.0-rc9
  GitCommit:        d736ef14f0288d6993a1845745d6756cfc9ddd5a
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info: server

Client:
 Debug Mode: false

Server:
 Containers: 9
  Running: 7
  Paused: 0
  Stopped: 2
 Images: 520
 Server Version: 19.03.1
 Storage Driver: aufs
  Root Dir: /var/lib/docker/aufs
  Backing Filesystem: extfs
  Dirs: 627
  Dirperm1 Supported: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: zeenic2wt2gnvnlgpals2llbv
  Is Manager: true
  ClusterID: mmee22mjudztn5ss2ejptr6o9
  Managers: 1
  Nodes: 17
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 2
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: ***.***.5.4
  Manager Addresses:
   ***.***.5.4:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.4.0-151-generic
 Operating System: Ubuntu 16.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 62.8GiB
 Name: **********
 ID: S7GW:Q6CX:GXGM:UPJ4:6P6V:HUZC:5SUG:ISCP:JMRG:VMXG:4AFO:3ZRI
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: the aufs storage-driver is deprecated, and will be removed in a future release.

node (after removing cpu-limit)

 Debug Mode: false

Server:
 Containers: 4
  Running: 4
  Paused: 0
  Stopped: 0
 Images: 4
 Server Version: 19.03.5-ce
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: kay0h91cq0x6elkn62r70ombn
  Is Manager: false
  Node Address: ***.***.5.191
  Manager Addresses:
   ***.***.5.4:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d50db0a42053864a270f648048f9a8b4f24eced3.m
 runc version: d736ef14f0288d6993a1845745d6756cfc9ddd5a
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.4.2-arch1-1
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.33GiB
 Name: ***********
 ID: JARA:KVL5:TA5J:WWKQ:4CLQ:N72X:HZQQ:5QKE:TAIF:K2VH:47ZT:IURG
 Docker Root Dir: /data/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): All physical.

thaJeztah commented 4 years ago

I'm not sure I understand; you're setting the limit to a number higher than the number of available CPU's; isn't it expected to get an error in that case?

VynDragon commented 4 years ago

Well no, since it's a limit, not a requirement, and I don't expect a limit to throw a error when it's applicable (like, it currently seems to set a condition of if cpu_count > limit and cpu_available < limit, which doesn't make any sense as it sets a requirement, and a limit, and the requirement is not explicit) , and it's also quite handicapping when using different nodes with more or less cores. I'd only expect this error when using, for example here, --reserve-cpu 9, where it would make perfect sense.

VynDragon commented 4 years ago

Note also it seems to me it's a new behavior, as I didn't encounter it before updating recently.

michaeltinsley commented 4 years ago

To add to this, the behaviour between memory limits and CPU limits is different.

If the specified CPU limit is greater than the CPU count, a container will refuse to start. However, if the specified memory limit is greater than the actual memory count, the container will start just fine, although the set limit can never be reached.

I agree with @VynDragon that the expected behaviour should be that of the memory limit, whereby a software limit greater than the hardware limit can be set.

nepella commented 3 years ago

I'm not sure I understand; you're setting the limit to a number higher than the number of available CPU's; isn't it expected to get an error in that case?

@thaJeztah: No. If I configure a service to use no more than twelve CPUs, and I deploy it to a machine having only eight CPUs, I would not expect a problem, since the service obviously will not be using more than twelve CPUs. (My use case is that I want to limit a particular service to use no more than two CPUs in production while being able to test it on a single-core VM.)

I think the relevant code is at daemon/daemon_unix.go, lines 520–522:

if resources.NanoCPUs < 0 || resources.NanoCPUs > int64(sysinfo.NumCPU())*1e9 {
    return warnings, fmt.Errorf("Range of CPUs is from 0.01 to %d.00, as there are only %d CPUs available", sysinfo.NumCPU(), sysinfo.NumCPU())
}

This was added by 846baf1fd3 ("Add --cpus flag to control cpu resources"). @yongtang: Could the check resources.NanoCPUs > int64(sysinfo.NumCPU()) * 1e9 be removed?