moby / libnetwork

networking for containers
Apache License 2.0
2.15k stars 882 forks source link

Performance issue with persisting driver endpoint to store #1374

Closed coolljt0725 closed 7 years ago

coolljt0725 commented 8 years ago

To support container live restore, we persist driver endpoint to store which is a good way for each network driver. But persisting endpoint to store cause a performance issue. It will take take more time to run a container and the situation is worse in parallel. Here is some test results using https://github.com/crosbymichael/docker-stress.
The stress.json is

$ cat stress.json
[
    {"name":"busybox", "args": ["true"]}
]

docker 1.11.2 with live restore(we backport the liver restore patch)

 ./stress --containers 50
INFO[0339] ran 52 containers in 339.05 seconds (0.15 container per sec. or 6.52 sec. per container)
INFO[0339] failures 0

 ./stress --containers 50 -c 10
ERRO[0024] docker: Error response from daemon: failed to create endpoint fervent_noyce on network bridge: failed to save bridge endpoint f3d7dbe to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout.
  error=exit status 125
ERRO[0035] docker: Error response from daemon: failed to create endpoint lonely_jepsen on network bridge: failed to save bridge endpoint cb6b394 to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout.
  error=exit status 125
INFO[0330] ran 61 containers in 330.20 seconds (0.18 container per sec. or 5.41 sec. per container)
INFO[0330] failures 1

$ time docker run -ti busybox true

real    0m2.720s
user    0m0.017s
sys     0m0.019s
$ time docker run -ti busybox true

real    0m6.190s
user    0m0.015s
sys     0m0.018s
$ time docker run -ti busybox true

real    0m2.837s
user    0m0.014s
sys     0m0.015s
$ time docker run -ti busybox true

real    0m3.864s
user    0m0.017s
sys     0m0.015s
$ time docker run -ti busybox true

real    0m3.694s
user    0m0.016s
sys     0m0.011s
$ time docker run -ti busybox true

real    0m2.747s
user    0m0.015s
sys     0m0.018s

If we increase the concurrent workers of stress, there will be a lot of timeout error, see

./stress -c 50
ERRO[0133] docker: Error response from daemon: failed to create endpoint dreamy_keller on network bridge: failed to save bridge endpoint 9b9f68e to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout.
  error=exit status 125
ERRO[0133] docker: Error response from daemon: failed to create endpoint ecstatic_boyd on network bridge: failed to save bridge endpoint b9db3b1 to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout.
  error=exit status 125
ERRO[0137] docker: Error response from daemon: failed to update store for object type *libnetwork.endpoint: timeout.
  error=exit status 125
ERRO[0153] docker: Error response from daemon: failed to create endpoint hungry_mclean on network bridge: failed to save bridge endpoint 9678724 to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout.
  error=exit status 125
ERRO[0166] docker: Error response from daemon: failed to create endpoint suspicious_leakey on network bridge: failed to save bridge endpoint 39b33ea to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout.
  error=exit status 125

docker 1.11.2 without live restore

 ./stress --containers 50
INFO[0228] ran 52 containers in 228.99 seconds (0.23 container per sec. or 4.40 sec. per container)
INFO[0228] failures 0

 ./stress --containers 50 -c 10
INFO[0238] ran 61 containers in 238.24 seconds (0.26 container per sec. or 3.91 sec. per container)
INFO[0238] failures 0

$ time docker run -ti busybox true

real    0m2.379s
user    0m0.014s
sys     0m0.015s
$ time docker run -ti busybox true

real    0m2.387s
user    0m0.022s
sys     0m0.010s
$ time docker run -ti busybox true

real    0m2.584s
user    0m0.020s
sys     0m0.013s

There is significant performance decrease with live restore. And also with persisting driver endpoint to store, there are also some consistent issues, so I suggest we can reconsider the re-construct endpoint approach, if it can reconstruct, we avoid persisting. I still think the less persisting to store the better. I think @aboch @mavenugo WDYT?

danbeaulieu commented 8 years ago

Also being affected by this on docker 1.12.1.

It appears from the time a host is provisioned until the time we start seeing errors is very small. From the logs, it DOES look like it recovers but it can get back in to the bad state and stay there.

@coolljt0725 interested in the stress app/script. Is it open source?

aboch commented 8 years ago

@coolljt0725 A couple of weeks ago I ran your test with docker-stress, and also increased the parallelism to 100 and more with 50/100/1000 containers.

I was not able to reproduce the problem.

I believe it has to do with the disk driver speed (I am running on a pretty empty SSD), its overload and maybe disk space and fragmentation. Somebody has infact reported that the stale data issue is easily reproducible on old spinning hard drivers.

At this moment we have not heard enough of such complains and given the issue seems related to slow disks, it does not seem enough of a strong point for a pervasive code rework. Also given the fact there will likely be issues with the other approach as well, few of which were already raised in https://github.com/docker/libnetwork/pull/1135 comments, in my opinion we should hold on on this and revisit later.

vincer commented 8 years ago

We're running into this issue a lot. We have an image with a VOLUME /foo where there's 1-2GB of data in /foo that needs to be copied from the image into the volume at container creation time. I think that's causing us to run into this issue quickly even though we are running on bare metal hosts with SSDs in a RAID0 config. I can repro with our image with docker-stress easily.

Our workaround for now is to not make that directory a volume, however we incur a performance penalty at runtime due to that. Is there any other workaround to this issue or any progress to for fixing it?

coolljt0725 commented 8 years ago

@aboch This happens on some of our server, I'm sure this has something to do with the driver speed. For now, the workaround for us is to increase the timeout of transientTimeout of boltdb . I hope the libnetwork could set the transientTimeout rather than just use the default timeout. WDYT?

aboch commented 8 years ago

@coolljt0725 After this issue, when we looked at the current timeout, we thought 10 seconds was already a big one. But I agree, we can increase it. Because it is clear something wrong is happening down the chain and that does not seem to be under our control. Minus well we can increase the timeout to a minute, as long as that makes it better. Were you guys able to verify that the bigger timeout improved things ?

Given libnetwork does not create the bolt interface with persistent connection option, it can't currently set the transient timeout. I think it is fine to just change the default in libkv project as you are doing with your PR.

aboch commented 8 years ago

@coolljt0725 What about instead changing libkv to set BoltDB.timeout = options.ConnectionTimeout if options.PersistConnection==false && options.ConnectionTimeout != 0. At the end that is the timeout which is used by bolt.Open()

Then we can control it via libnetwork in https://github.com/docker/libnetwork/blob/release/v0.8/datastore/datastore.go#L136.

coolljt0725 commented 8 years ago

@aboch We had improved the timeout to 2 min on our servers and this problem never happened again, but this is still just a workaround, there are a little loss on container starting time, but it's not a big deal at the moment.

coolljt0725 commented 8 years ago

What about instead changing libkv to set BoltDB.timeout = options.ConnectionTimeout if options.PersistConnection==false && options.ConnectionTimeout != 0. At the end that is the timeout which is used by bolt.Open() Then we can control it via libnetwork in https://github.com/docker/libnetwork/blob/release/v0.8/datastore/datastore.go#L136.

That's a good idea. :+1:

aboch commented 7 years ago

Closed by #1546