sodafoundation / installer

provides easy installation and basic deployment based on specific configurations for SODA Projects
Apache License 2.0
35 stars 68 forks source link

Gelato fails on CentOS/Suse #124

Closed noelmcloughlin closed 5 years ago

noelmcloughlin commented 5 years ago

Describe the bug The docker compose of gelato works on Ubuntu but fails on SuSE/CentOS. Has anyone see this problem or have suggestions to resolve?

To Reproduce This behaviour was seen while testing #13 but can probably be replicated by ansible installer.

Expected behavior docker compose of gelato should work on other OS not just Ubuntu.

Additional context

[root@localhost salt]# journalctl -u opensds-multi-cloud --follow
-- Logs begin at Tue 2019-03-12 15:38:47 MDT. --
Mar 12 18:15:25 localhost.localdomain systemd[1]: Started OpenSDS multi-cloud service.
Mar 12 18:15:26 localhost.localdomain docker-compose[1990]: Creating network "multicloud_default" with the default driver
Mar 12 18:15:26 localhost.localdomain docker-compose[1990]: Pulling datastore (mongo:latest)...
Mar 12 18:15:28 localhost.localdomain docker-compose[1990]: latest: Pulling from library/mongo
Mar 12 18:18:07 localhost.localdomain docker-compose[1990]: Digest: sha256:c4fe6705e1dffb91d3fdb4f2c00f58a5ce9b82dd010bce33e250d320518047b5
Mar 12 18:18:07 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for mongo:latest
Mar 12 18:18:07 localhost.localdomain docker-compose[1990]: Pulling zookeeper (wurstmeister/zookeeper:latest)...
Mar 12 18:18:09 localhost.localdomain docker-compose[1990]: latest: Pulling from wurstmeister/zookeeper
Mar 12 18:22:54 localhost.localdomain docker-compose[1990]: Digest: sha256:7a7fd44a72104bfbd24a77844bad5fabc86485b036f988ea927d1780782a6680
Mar 12 18:22:54 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for wurstmeister/zookeeper:latest
Mar 12 18:22:54 localhost.localdomain docker-compose[1990]: Pulling kafka (wurstmeister/kafka:2.11-2.0.1)...
Mar 12 18:22:57 localhost.localdomain docker-compose[1990]: 2.11-2.0.1: Pulling from wurstmeister/kafka
Mar 12 18:26:14 localhost.localdomain docker-compose[1990]: Digest: sha256:20d08a6849383b124bccbe58bc9c48ec202eefb373d05e0a11e186459b84f2a0
Mar 12 18:26:14 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for wurstmeister/kafka:2.11-2.0.1
Mar 12 18:26:14 localhost.localdomain docker-compose[1990]: Pulling dataflow (opensdsio/multi-cloud-dataflow:latest)...
Mar 12 18:26:17 localhost.localdomain docker-compose[1990]: latest: Pulling from opensdsio/multi-cloud-dataflow
Mar 12 18:26:27 localhost.localdomain docker-compose[1990]: Digest: sha256:19c2319990f879614c4116c6773bb813c55ee855356a37bb4bf27c683c5ef8a7
Mar 12 18:26:27 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for opensdsio/multi-cloud-dataflow:latest
Mar 12 18:26:27 localhost.localdomain docker-compose[1990]: Pulling datamover (opensdsio/multi-cloud-datamover:latest)...
Mar 12 18:26:30 localhost.localdomain docker-compose[1990]: latest: Pulling from opensdsio/multi-cloud-datamover
Mar 12 18:26:44 localhost.localdomain docker-compose[1990]: Digest: sha256:d2d4bafd402789dec66e1dcd33c3c6ba41a55825220d8f4d5672375ad8812269
Mar 12 18:26:44 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for opensdsio/multi-cloud-datamover:latest
Mar 12 18:26:44 localhost.localdomain docker-compose[1990]: Pulling s3 (opensdsio/multi-cloud-s3:latest)...
Mar 12 18:26:47 localhost.localdomain docker-compose[1990]: latest: Pulling from opensdsio/multi-cloud-s3
Mar 12 18:26:56 localhost.localdomain docker-compose[1990]: Digest: sha256:ab970c10d8073b70215e62f486595006479e6a8d717449b9977e2f1eabfdd43e
Mar 12 18:26:56 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for opensdsio/multi-cloud-s3:latest
Mar 12 18:26:56 localhost.localdomain docker-compose[1990]: Pulling backend (opensdsio/multi-cloud-backend:latest)...
Mar 12 18:26:58 localhost.localdomain docker-compose[1990]: latest: Pulling from opensdsio/multi-cloud-backend
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Digest: sha256:1120397317d5706ff68ca2223c3a9f82152bee1ae8051ddc0b44b14e4f8cbfa3
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for opensdsio/multi-cloud-backend:latest
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Creating multicloud_zookeeper_1 ...
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Creating multicloud_backend_1   ...
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Creating multicloud_s3_1        ...
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Creating multicloud_datastore_1 ...
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Creating multicloud_api_1       ...
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: Pulling api (opensdsio/multi-cloud-api:latest)...
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: [55B blob data]
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: ERROR: for multicloud_zookeeper_1  Cannot start service zookeeper: b'network multicloud_default not found'
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: [55B blob data]
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: ERROR: for multicloud_backend_1  Cannot start service backend: b'network multicloud_default not found'
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: [55B blob data]
Mar 12 18:27:08 localhost.localdomain docker-compose[1990]: ERROR: for multicloud_datastore_1  Cannot start service datastore: b'network multicloud_default not found'
Mar 12 18:27:11 localhost.localdomain docker-compose[1990]: [101B blob data]
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: Digest: sha256:76f98d7937f36623bd97a49a2c7640b11ad783ba16eed59115191ba03ec640ff
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: Status: Downloaded newer image for opensdsio/multi-cloud-api:latest
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for multicloud_s3_1  Cannot start service s3: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: [55B blob data]
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for multicloud_api_1  Cannot start service api: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for api  Cannot start service api: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for zookeeper  Cannot start service zookeeper: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for s3  Cannot start service s3: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for datastore  Cannot start service datastore: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: ERROR: for backend  Cannot start service backend: b'network multicloud_default not found'
Mar 12 18:27:24 localhost.localdomain docker-compose[1990]: Encountered errors while bringing up the project.
Mar 12 18:27:24 localhost.localdomain systemd[1]: opensds-multi-cloud.service: main process exited, code=exited, status=1/FAILURE
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: Removing multicloud_api_1       ...
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: Removing multicloud_s3_1        ...
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: Removing multicloud_datastore_1 ...
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: Removing multicloud_backend_1   ...
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: Removing multicloud_zookeeper_1 ...
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: [305B blob data]
Mar 12 18:27:24 localhost.localdomain docker-compose[3568]: Network multicloud_default not found.
Mar 12 18:27:24 localhost.localdomain systemd[1]: Unit opensds-multi-cloud.service entered failed state.
Mar 12 18:27:24 localhost.localdomain systemd[1]: opensds-multi-cloud.service failed.
leonwanghui commented 5 years ago

@wisererik Please take a look at this issue

noelmcloughlin commented 5 years ago

I think this is known Docker issue. One suggested workaround is to use --force-recreate flag with docker-compose.

In my earlier testing I never saw this issue on CentOS or SuSE. However it started happening after I had added cleanup-disk-space task as final stage and docker system prune -a -f gets executed.

But systemd had not finished running docker compose up. That maybe root cause. I will test and fix.

noelmcloughlin commented 5 years ago

PR raised upstream: https://github.com/saltstack-formulas/opensds-formula/pull/86/files

noelmcloughlin commented 5 years ago

Still investigating whether --force-recreate flag is required or not. With above PR things look better but Kafka container died. So I rebooted OS and on restart systemd ran docker-compose up --force-recreate for gelato and status is good.

vagrant-openSUSE-Leap:/home/vagrant # journalctl -u opensds-multi-cloud --follow
-- Logs begin at Wed 2019-03-13 18:20:52 MDT. --
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_s3_1 ...
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_datastore_1 ...
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_s3_1
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_datastore_1
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_backend_1 ...
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_zookeeper_1 ...
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_zookeeper_1
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_backend_1
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_api_1 ...
Mar 13 18:21:26 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_api_1
Mar 13 18:21:32 vagrant-openSUSE-Leap docker-compose[1239]: [89B blob data]
Mar 13 18:21:32 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_kafka_1
Mar 13 18:21:35 vagrant-openSUSE-Leap docker-compose[1239]: [298B blob data]
Mar 13 18:21:35 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_datamover_1
Mar 13 18:21:35 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_dataflow_1 ...
Mar 13 18:21:35 vagrant-openSUSE-Leap docker-compose[1239]: Recreating multicloud_dataflow_1
Mar 13 18:21:40 vagrant-openSUSE-Leap docker-compose[1239]: [294B blob data]
Mar 13 18:21:40 vagrant-openSUSE-Leap docker-compose[1239]: zookeeper_1  | ZooKeeper JMX enabled by default
Mar 13 18:21:40 vagrant-openSUSE-Leap docker-compose[1239]: zookeeper_1  | Using config: /opt/zookeeper-3.4.13/bin/../conf/zoo.cfg
Mar 13 18:21:40 vagrant-openSUSE-Leap docker-compose[1239]: zookeeper_1  | 2019-03-14 00:21:31,850 [myid:] - INFO  [main:QuorumPeerConfig@136] - Reading configuration from: /opt/zookeeper-3.4.13/bin/../conf/zoo.cfg
Mar 13 18:21:40 vagrant-openSUSE-Leap docker-compose[1239]: zookeeper_1  | 2019-03-14 00:21:31,854 [myid:] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3

vagrant-openSUSE-Leap:/home/vagrant # docker ps
CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS                                                   NAMES
bf94443f5a27        opensdsio/multi-cloud-datamover   "/datamover"             3 minutes ago       Up 3 minutes                                                                multicloud_datamover_1
cb3a7e417f3f        opensdsio/multi-cloud-dataflow    "/dataflow"              3 minutes ago       Up 3 minutes                                                                multicloud_dataflow_1
1dba1887a853        wurstmeister/kafka:2.11-2.0.1     "start-kafka.sh"         4 minutes ago       Up 4 minutes        0.0.0.0:9092->9092/tcp                                  multicloud_kafka_1
122d63570e46        opensdsio/multi-cloud-backend     "/backend"               4 minutes ago       Up 4 minutes                                                                multicloud_backend_1
ae9de147e427        opensdsio/multi-cloud-api         "/api"                   4 minutes ago       Up 4 minutes        0.0.0.0:8089->8089/tcp                                  multicloud_api_1
4155f64dcff7        opensdsio/multi-cloud-s3          "/s3"                    4 minutes ago       Up 4 minutes                                                                multicloud_s3_1
db2982dae939        mongo                             "docker-entrypoint.s…"   4 minutes ago       Up 4 minutes        0.0.0.0:27017->27017/tcp                                multicloud_datastore_1
1e6ded0243cd        wurstmeister/zookeeper            "/bin/sh -c '/usr/sb…"   4 minutes ago       Up 4 minutes        22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp      multicloud_zookeeper_1
a53cd4c86468        opensdsio/dashboard:latest        "/bin/sh -c /opt/das…"   30 minutes ago      Up 4 minutes                                                                dashboard
115c94ea891c        lvm-debian-cinder                 "bash -c /scripts/lv…"   32 minutes ago      Up 4 minutes                                                                blockbox_cinder-volume_1
c43597740ad0        debian-cinder                     "cinder-scheduler"       32 minutes ago      Up 4 minutes                                                                blockbox_cinder-scheduler_1
036eed406f87        debian-cinder                     "sh /scripts/cinder-…"   32 minutes ago      Up 4 minutes                                                                blockbox_cinder-api_1
2d9e3d070a71        rabbitmq                          "docker-entrypoint.s…"   32 minutes ago      Up 4 minutes        4369/tcp, 5671/tcp, 25672/tcp, 0.0.0.0:5672->5672/tcp   blockbox_rabbitmq_1
e4b44b92ef83        mariadb                           "docker-entrypoint.s…"   32 minutes ago      Up 4 minutes        0.0.0.0:3307->3306/tcp                                  blockbox_mariadb_1
3e456c7e4727        quay.io/coreos/etcd:latest        "etcd -name osdsdb -…"   About an hour ago   Up 4 minutes                                                                osdsdb

vagrant-openSUSE-Leap:/home/vagrant # osdsctl pool list
WARNING: Not found Env OPENSDS_AUTH_STRATEGY, use default(noauth)
+--------------------------------------+-----------------+-------------+--------+------------------+---------------+--------------+
| Id                                   | Name            | Description | Status | AvailabilityZone | TotalCapacity | FreeCapacity |
+--------------------------------------+-----------------+-------------+--------+------------------+---------------+--------------+
| 3d28df2a-78a6-500a-9c52-347301396de7 | opensds-volumes |             |        | default          | 1             | 1            |
+--------------------------------------+-----------------+-------------+--------+------------------+---------------+--------------+
noelmcloughlin commented 5 years ago

PR merged upstream. Flag --force-recreate is not needed for this issue.