moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.37k stars 616 forks source link

Schedulung performance - CPU usage #2763

Open felixsittenauer opened 6 years ago

felixsittenauer commented 6 years ago

I did a performance test in order to test the scheduling performance of Docker swarm. For this purpose I measured the time it takes to schedule and start 1000 containers on 100 worker nodes. A cluster of 3 Manager nodes is used.

The graphs show the cpu usage of the 3 manager nodes and one worker node during the scheduling process. The time 0 is the time where the scheduling action was started.

In the first graph no service was ever created or scheduled before (Fresh cluster).

bildschirmfoto 2018-10-15 um 16 47 25

In the second graph the experiment was repeated several times before.

bildschirmfoto 2018-10-15 um 16 50 24

While all 1000 containers were scheduled and started in under 2,5 seconds the cpu usage is higher during the scheduling and is still over 150% 60 seconds after the scheduling finished.

What is going on here? Why has the fresh cluster a lower cpu usage?

RamjiVE commented 5 years ago

Do we have an update??

rei-ifesca commented 5 years ago

This also happens with a 3 manager and 3 worker cluster and about 50 containers inside a single stack.

RamjiVE commented 5 years ago

we have 3 master and 8 worker nodes, still facing the high cpu usage and the tasks are not being scheduled because of cpu consumption!

mvandermade commented 4 years ago

Are you scheduling the tasks on the manager node (are they drained)? Maybe they have RAM issue's ?

felixsittenauer commented 4 years ago

The experiment was done on AWS: The 3 manager nodes were Ubuntu 16.04 m5.xlarge instances, with 4 vCPUs (3,1 GHz, Xeon Platinum 8000) and 16 GB RAM. The 100 worker nodes were Ubuntu 16.04 m5.large instances, with 2 vCPUs (3,1 GHz, Xeon Platinum 8000) and 8 GB RAM. Manager and worker nodes are connected through a AWS Virtual Private Network (VPC) with up to 10 GBit/s. The Metrics were collected by Elastic Metricbeat. During the experiment the manager nodes consumed about 690MB and the worker nodes about 680MB memory.

mvandermade commented 4 years ago

And just to validate my doubts, did you drain the managers? (Because by default they also accept tasks). Did you also see the same cpu behaviour on the workers?