me-box / databox

Databox container manager and dashboard server
MIT License
94 stars 25 forks source link

amd64 vs arm64v8 Databox startup time #297

Open Toshbrown opened 5 years ago

Toshbrown commented 5 years ago

Comparison of databox startup time and some benchmarks. The amd64 platform was docker for mac on a MBP (2.7 GHz Intel Core i7) vs arm64v8 a raspberry Pi model 3b+, using @jptmoore 's Alpine image with swap and havaged installed. These are not scientific but I hope it will show us where to focus our efforts to improve performance on Arm64v8.

TLDR: First install of databox on a Pi os 6x slower than on docker for mac! subsequent restarts are 3x slower.

1) We should move to EllipticP256 to improve https certificate generation time 2) Image size and layer count are very important on Pi's all efforts should be made to reduce the size of arm images.

0.5.1 startup time fresh install

test setup:

# To get to a clean state run
docker rm  $(docker ps -a -q)
docker rmi $(docker images -q) -f
docker volume rm container-manager-certs
docker volume rm container-manager-core-store

# start databox using
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v -t databoxsystems/databox:0.5.1 /databox start -sslHostName $(hostname)

amd64

Click to see command output ``` [INFO]2018/11/05 10:41:38 Starting Databox 0.5.1 [INFO]2018/11/05 10:41:38 Pulling Image databoxsystems/container-manager-amd64:0.5.1 [INFO]2018/11/05 10:41:41 Done pulling Image databoxsystems/container-manager-amd64:0.5.1 [INFO]2018/11/05 10:41:41 Outputting logs container-manager : [INFO]2018/11/05 10:41:43 GenRootCA called container-manager : [INFO]2018/11/05 10:41:44 Making cert ./certs/databox-network.pem container-manager : [INFO]2018/11/05 10:41:44 Making cert ./certs/export-service.pem container-manager : [INFO]2018/11/05 10:41:44 Making cert ./certs/arbiter.pem container-manager : [INFO]2018/11/05 10:41:44 Making cert ./certs/app-server.pem container-manager : [INFO]2018/11/05 10:41:44 ContainerManager Started container-manager : [INFO]2018/11/05 10:41:44 Starting databox-network container-manager : [INFO]2018/11/05 10:41:44 Pulling Image databoxsystems/core-network-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:41:49 Done pulling Image databoxsystems/core-network-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:41:51 [startCoreNetwork] done starting databox-network container-manager : [INFO]2018/11/05 10:41:51 Pulling Image databoxsystems/core-network-relay-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:41:54 Done pulling Image databoxsystems/core-network-relay-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:41:55 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:41:55 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:41:55 Restarting the Container Manager container-manager : [INFO]2018/11/05 10:41:59 ContainerManager Started container-manager : [INFO]2018/11/05 10:41:59 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:41:59 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:41:59 Pulling Image databoxsystems/arbiter-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:03 Done pulling Image databoxsystems/arbiter-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:12 Pulling Image databoxsystems/export-service-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:12 Pulling Image databoxsystems/core-store-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:15 Done pulling Image databoxsystems/export-service-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:16 Done pulling Image databoxsystems/core-store-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:22 Password=ihNO1B9QR1v9s2ojoDxvU48DdpXFs/Mc container-manager : [INFO]2018/11/05 10:42:22 Container Manager Ready and waiting container-manager : [INFO]2018/11/05 10:42:22 Restarting saved apps and drivers container-manager : [INFO]2018/11/05 10:42:22 Installing app-store container-manager : [INFO]2018/11/05 10:42:22 API: registering ServiceStatus container-manager : [INFO]2018/11/05 10:42:22 Installing core-ui container-manager : [INFO]2018/11/05 10:42:29 Pulling Image databoxsystems/driver-app-store-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:30 Pulling Image databoxsystems/core-ui-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:32 Done pulling Image databoxsystems/driver-app-store-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:32 Done pulling Image databoxsystems/core-ui-amd64:0.5.1 container-manager : [INFO]2018/11/05 10:42:33 Successfully installed core-ui container-manager : [INFO]2018/11/05 10:42:39 Successfully installed app-store ```

Total time 56 seconds

arm64

Click to see command output ``` container-manager : [INFO]2018/11/05 10:49:15 GenRootCA called container-manager : [INFO]2018/11/05 10:49:59 Making cert ./certs/databox-network.pem container-manager : [INFO]2018/11/05 10:50:13 Making cert ./certs/export-service.pem container-manager : [INFO]2018/11/05 10:50:37 Making cert ./certs/arbiter.pem container-manager : [INFO]2018/11/05 10:51:37 Making cert ./certs/app-server.pem container-manager : [INFO]2018/11/05 10:52:02 ContainerManager Started container-manager : [INFO]2018/11/05 10:52:02 Starting databox-network container-manager : [INFO]2018/11/05 10:52:08 [startCoreNetwork] done starting databox-network container-manager : [INFO]2018/11/05 10:52:11 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:52:11 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:52:11 Restarting the Container Manager container-manager : [INFO]2018/11/05 10:52:18 ContainerManager Started container-manager : [INFO]2018/11/05 10:52:18 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:52:18 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:52:43 Pulling Image databoxsystems/export-service-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:53:06 Done pulling Image databoxsystems/export-service-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:53:08 Pulling Image databoxsystems/core-store-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:53:50 Done pulling Image databoxsystems/core-store-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:54:01 Password=4pAfzD57q3Uzzy0nGzBfrEz01JGGzjc4 container-manager : [INFO]2018/11/05 10:54:01 API: registering ServiceStatus container-manager : [INFO]2018/11/05 10:54:01 Installing core-ui container-manager : [INFO]2018/11/05 10:54:01 Container Manager Ready and waiting container-manager : [INFO]2018/11/05 10:54:01 Restarting saved apps and drivers container-manager : [INFO]2018/11/05 10:54:01 Installing app-store container-manager : [INFO]2018/11/05 10:54:09 Pulling Image databoxsystems/driver-app-store-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:54:12 Pulling Image databoxsystems/core-ui-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:54:22 Done pulling Image databoxsystems/core-ui-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:54:24 Done pulling Image databoxsystems/driver-app-store-arm64v8:0.5.1 container-manager : [INFO]2018/11/05 10:54:57 Successfully installed core-ui container-manager : [INFO]2018/11/05 10:55:28 Successfully installed app-store ```

Total time 373 seconds (arbiter and core network were pre-pulled to avoid race condition)

Creating the 4 HTTPS certs for the core components takes 83 seconds on arm64v8 vs under 1 second on amd64. HTTPS certs are also generated for each driver and app installed so improving this would be a big win.

0.5.1 startup time after first setup

With driver-os-monitor and app-os-monitor started at startup.

databox started once to populate images how long does it take

amd64

Click to see command output ``` container-manager : [INFO]2018/11/05 10:03:52 ContainerManager Started container-manager : [INFO]2018/11/05 10:03:52 Starting databox-network container-manager : [INFO]2018/11/05 10:03:55 [startCoreNetwork] done starting databox-network container-manager : [INFO]2018/11/05 10:03:55 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:03:55 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:03:55 Restarting the Container Manager container-manager : [INFO]2018/11/05 10:04:00 ContainerManager Started container-manager : [INFO]2018/11/05 10:04:00 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:04:00 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:04:15 Password=6k+U0A4xebuQaiQPu1oiU8iT60rpJ1jd container-manager : [INFO]2018/11/05 10:04:15 Container Manager Ready and waiting container-manager : [INFO]2018/11/05 10:04:15 Restarting saved apps and drivers container-manager : [INFO]2018/11/05 10:04:15 Installing app-store container-manager : [INFO]2018/11/05 10:04:15 API: registering ServiceStatus container-manager : [INFO]2018/11/05 10:04:15 Installing core-ui container-manager : [INFO]2018/11/05 10:04:15 Installing driver-twitter container-manager : [INFO]2018/11/05 10:04:15 Installing driver-os-monitor container-manager : [INFO]2018/11/05 10:04:22 Successfully installed core-ui container-manager : [INFO]2018/11/05 10:04:29 Successfully installed app-store container-manager : [INFO]2018/11/05 10:04:30 Successfully installed driver-os-monitor container-manager : [INFO]2018/11/05 10:04:31 Successfully installed driver-twitter container-manager : [INFO]2018/11/05 10:04:31 Installing app-os-monitor container-manager : [INFO]2018/11/05 10:04:38 Successfully installed app-os-monitor ```

Total time 46 seconds

arm64v8

Click to see command output ``` container-manager : [INFO]2018/11/05 10:03:49 ContainerManager Started container-manager : [INFO]2018/11/05 10:03:49 Starting databox-network container-manager : [INFO]2018/11/05 10:03:56 [startCoreNetwork] done starting databox-network container-manager : [INFO]2018/11/05 10:04:01 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:04:01 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:04:01 Restarting the Container Manager container-manager : [INFO]2018/11/05 10:04:08 ContainerManager Started container-manager : [INFO]2018/11/05 10:04:08 [updateContainerManager] called container-manager : [INFO]2018/11/05 10:04:08 [updateContainerManager] updating dns server container-manager : [INFO]2018/11/05 10:04:36 Password=3ajK8qiLkLAyRa1UBMruwHG5KUB1BdAM container-manager : [INFO]2018/11/05 10:04:36 Container Manager Ready and waiting container-manager : [INFO]2018/11/05 10:04:36 Restarting saved apps and drivers container-manager : [INFO]2018/11/05 10:04:36 Installing app-store container-manager : [INFO]2018/11/05 10:04:36 Installing core-ui container-manager : [INFO]2018/11/05 10:04:36 API: registering ServiceStatus container-manager : [INFO]2018/11/05 10:04:36 Installing driver-twitter container-manager : [INFO]2018/11/05 10:04:36 Installing driver-os-monitor container-manager : [INFO]2018/11/05 10:05:07 Successfully installed core-ui container-manager : [INFO]2018/11/05 10:05:51 Successfully installed driver-twitter container-manager : [INFO]2018/11/05 10:05:55 Successfully installed driver-os-monitor container-manager : [INFO]2018/11/05 10:05:55 Installing app-os-monitor container-manager : [INFO]2018/11/05 10:06:20 Successfully installed app-store container-manager : [INFO]2018/11/05 10:06:40 Successfully installed app-os-monitor ```

Total time 171 seconds

A few benchmarks

A couple of Benchmark results (using golang benchmarking support) looking at https cert generation and a Listing and scanning images.

amd64

BenchmarkGenerateKeyRSA2024-8                 10         186284433 ns/op
BenchmarkGenerateKeyEllipticP256-8        100000             17432 ns/op
BenchmarkGenCert-8                            10         178309417 ns/op
BenchmarkGenerateArbiterToken-8          2000000               662 ns/op
BenchmarkImageExists-8                      1000           1524519 ns/op
PASS

arm64v8

BenchmarkGenerateKeyRSA2024-4                  1    9424823855 ns/op
BenchmarkGenerateKeyEllipticP256-4           300       5442465 ns/op
BenchmarkGenCert-4                             1    15617880924 ns/op
BenchmarkGenerateArbiterToken-4           300000          5563 ns/op
BenchmarkImageExists-4                      1000       2062057 ns/op
PASS

GenerateKeyRSA is the slow part of the HTTPs cert generation. This is done for every installed app and driver. Arm platforms are heavily affected by the slow performance of math/bignatmontgomery and math/bigaddMulVVW used in creating RSA keys. GenerateKeyRSA takes 0.18 seconds on amd64 and 9.42 seconds on arm64v8. Moving to EllipticP256 for HTTPS keys would offer a large time-saving.

Pulling images

Using gnu time to look at pulling image form docker hub. A more detailed analysis is tricky as most of the work so done by dockerd and not the docker command.

Images pulled:

Observations: reported download speed by the docker cmd was approximately equal on both platforms at 17 Mbs

darwin amd64

Command being timed: "docker pull databoxsystems/driver-os-monitor-amd64:0.5.1"
User time (seconds): 0.08
System time (seconds): 0.03
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.95

Command being timed: "docker pull databoxsystems/core-store-amd64:0.5.1"
User time (seconds): 0.08
System time (seconds): 0.03
Percent of CPU this job got: 2%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.74

linux arm64

Command being timed: "docker pull databoxsystems/driver-os-monitor-amd64:0.5.1"
User time (seconds): 0.99
System time (seconds): 0.25
Percent of CPU this job got: 1%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1m 17.28s

Command being timed: "docker pull databoxsystems/core-store-amd64:0.5.1"
User time (seconds): 0.90
System time (seconds): 0.18
Percent of CPU this job got: 4%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 23.15s

The image pull test on arm64v8 is much slower. These figures don't show the CPU usage of dockerd, but the docker client reports slower layer extraction throughput. If this is CPU or IO limited is difficult to observe, but it is clear that reducing image size and layer count will have a significant impact on startup and app/driver install times on arm64v8.

docker images on the amd64 gives

REPOSITORY                                TAG                                SIZE
databoxsystems/driver-app-store-amd64     0.5.0                                15.9MB
databoxsystems/core-ui-amd64              0.5.1                                17MB
databoxsystems/container-manager-amd64    0.5.1                                15.9MB
databoxsystems/arbiter-amd64              0.5.1                                15.7MB
databoxsystems/export-service-amd64       0.5.1                                18.1MB
databoxsystems/core-store-amd64           0.5.1                                35MB
databoxsystems/core-network-relay-amd64   0.5.1                                21.6MB
databoxsystems/core-network-amd64         0.5.1                                26.9MB
databoxsystems/databox                    0.5.1                                14.9MB

docker images on the arm64v8 gives

REPOSITORY                                  TAG                                   SIZE
databoxsystems/container-manager-arm64v8    0.5.1                                  17.2MB
databoxsystems/arbiter-arm64v8              0.5.1                                  51.8MB
databoxsystems/driver-app-store-arm64v8     0.5.1                                  17.5MB
databoxsystems/core-ui-arm64v8              0.5.1                                  15.3MB
databoxsystems/export-service-arm64v8       0.5.1                                  53.2MB
databoxsystems/core-store-arm64v8           0.5.1                                  61MB
databoxsystems/core-network-arm64v8         0.5.1                                  55.7MB
databoxsystems/core-network-relay-arm64v8   0.5.1                                  21.6MB
databoxsystems/databox                      0.5.1                                  14.1MB

From this its clear that some of the arm64v8 images are larger than necessary. This has a negative impact on startup time due to the slow image extraction step on arm64v8. For example:

databox-pi:~/databox# time docker pull databoxsystems/export-service-arm64v8:0.5.1
0.5.1: Pulling from databoxsystems/export-service-arm64v8
5fa8d229fadb: Pull complete
9404d3d8e2ec: Pull complete
afd5148f5f0e: Pull complete
de8d221615e5: Pull complete
c8610f7158a1: Pull complete
bc73ba19bf5b: Pull complete
8ff6f2a17b26: Pull complete
56b17f044482: Pull complete
939665277cd1: Pull complete
ab359e140d12: Pull complete
4b2959a40422: Pull complete
9e9533d593e8: Pull complete
154d5438bc86: Pull complete
6a2a22d110f8: Pull complete
c94ef17e67c1: Pull complete
Digest: sha256:738ff9fed02797317be4f48e468e5049dc9868281ea86b3332bf2ebabde6ada4
Status: Downloaded newer image for databoxsystems/export-service-arm64v8:0.5.1
real    1m 1.04s
user    0m 1.05s
sys 0m 0.36s
databox-pi:~/databox# time docker pull databoxsystems/export-service-amd64:0.5.1
0.5.1: Pulling from databoxsystems/export-service-amd64
3489d1c4660e: Pull complete
3b6831344250: Pull complete
c793c739016d: Pull complete
f17ea6370f54: Pull complete
69673a27bd46: Pull complete
Digest: sha256:db079ccb275a71f3e621ab789666502bf852611d0a95cf52ff327ff9cff08024
Status: Downloaded newer image for databoxsystems/export-service-amd64:0.5.1
real    0m 20.31s
user    0m 0.91s
sys 0m 0.18s

The difference between pulling the export-service-arm64v8 (53.2MB) and databoxsystems/export-service-amd64 (18.1MB) on a raspberry Pi is 39 seconds!! As far as I am aware there is no tecnical reason for the differance in image size.

jptmoore commented 5 years ago

The ARM build uses a base ocaml image across all the ocaml components which is large but this should not change often so could be included into the linux image to help reduce the pull time on this.

Toshbrown commented 5 years ago

It is not pulling the build image, just the final build artefact. Which are bigger than the x86 ones by as much as 3x.

your using resin/aarch64-alpine:3.5 as the base which is 30Mb compared to the 4Mb of arm64v8/alpine:3.5

That change alone will save a significant amount of time!!

You could release an image with the core components already pulled but then you would need to keep it up-to-date, for each release.

Toshbrown commented 5 years ago

I converted the container manager over to use ECC rather than RSA but it looks like the core-network only supports RSA 👎 it uses this (https://github.com/mirleft/ocaml-tls/blob/master/lwt/x509_lwt.ml#L49) to parse the certs

The error I'm getting is this:

2018-11-09 12:54:40 +00:00: WRN [service] while initializing tls: (Failure
  "Private key (/run/secrets/DATABOX_NETWORK.pem): X509: failed to parse private key"), roll with http

I also think it may be a wider issue with OCaml support for ECC https://github.com/mirleft/ocaml-tls/issues/362 but this issue is old so things could have moved on.

jptmoore commented 5 years ago

It is not pulling the build image, just the final build artefact. Which are bigger than the x86 ones by as much as 3x.

your using resin/aarch64-alpine:3.5 as the base which is 30Mb compared to the 4Mb of arm64v8/alpine:3.5

That change alone will save a significant amount of time!!

You could release an image with the core components already pulled but then you would need to keep it up-to-date, for each release.

Yes, I have noticed this also but not sure the reason behind the larger images on ARM. We should probably look at building our own longer term if we have suitable build machines.

Also, I would like to look at simplifying the builds by getting rid of the dependency on older ocaml packages required by ezirmin as currently this is PITA to support across a base ocaml image.

jptmoore commented 5 years ago

I converted the container manager over to use ECC rather than RSA but it looks like the core-network only supports RSA 👎 it uses this (https://github.com/mirleft/ocaml-tls/blob/master/lwt/x509_lwt.ml#L49) to parse the certs

The error I'm getting is this:

2018-11-09 12:54:40 +00:00: WRN [service] while initializing tls: (Failure
  "Private key (/run/secrets/DATABOX_NETWORK.pem): X509: failed to parse private key"), roll with http

I also think it may be a wider issue with OCaml support for ECC mirleft/ocaml-tls#362 but this issue is old so things could have moved on.

If we decide to continue with the current design of having a dedicated Export Service it might be worth moving it to Zest which does not have this issue.