nsacyber / WALKOFF

A flexible, easy to use, automation framework allowing users to integrate their capabilities and devices to cut through the repetitive, tedious tasks slowing them down. #nsacyber
https://nsacyber.github.io/WALKOFF/
Other
1.2k stars 222 forks source link

Cannot connect to host resource_minio:9000 ssl:default [Name or service not known] #253

Closed ch40s closed 4 years ago

ch40s commented 4 years ago

Hi all,

the OS and all software packages are up-to-date and the build process is being completed without any other errors. However, I get the following error because of Minio. Any ideas what might be wrong and how to resolve this?

$ ./walkoff.sh up
Starting WALKOFF Bootloader...
UMPIRE - DEBUG:Connected to Docker Engine: v19.03.5
BOOTLOADER - INFO:Skipping secret walkoff_encryption_key creation, it already exists.
BOOTLOADER - INFO:Skipping secret walkoff_internal_key creation, it already exists.
BOOTLOADER - INFO:Skipping secret walkoff_minio_access_key creation, it already exists.
BOOTLOADER - INFO:Skipping secret walkoff_minio_secret_key creation, it already exists.
BOOTLOADER - INFO:Skipping secret walkoff_mongo_key creation, it already exists.
BOOTLOADER - INFO:Skipping secret walkoff_redis_key creation, it already exists.
BOOTLOADER - INFO:Creating volumes for persisting (registry, minio, mongo, portainer)...
BOOTLOADER - INFO:Pulling image registry:2
BOOTLOADER - INFO:Pulled image registry:2.
BOOTLOADER - INFO:Pulling image bitnami/redis:5.0
BOOTLOADER - INFO:Pulled image bitnami/redis:5.0.
BOOTLOADER - INFO:Pulling image bitnami/minio:2019-debian-9
BOOTLOADER - INFO:Pulled image bitnami/minio:2019-debian-9.
BOOTLOADER - INFO:Pulling image mongo:4
BOOTLOADER - INFO:Pulled image mongo:4.
BOOTLOADER - INFO:Pulling image mongo-express:latest
BOOTLOADER - INFO:Pulled image mongo-express:latest.
BOOTLOADER - INFO:Pulling image portainer/portainer:latest
BOOTLOADER - INFO:Pulled image portainer/portainer:latest.
BOOTLOADER - INFO:Deploying base services (registry, minio, mongo, portainer, redis)...
BOOTLOADER - INFO:Updating service walkoff_resource_registry (id: s2gwdh00tvgw8phst8xm8jooc)
BOOTLOADER - INFO:Updating service walkoff_debug_mongo_express (id: 2z88sd0zm1tmvr5dfnw0rwv0x)
BOOTLOADER - INFO:Updating service walkoff_resource_minio (id: rq9pariu7n3n27i98fp2pl1b4)
BOOTLOADER - INFO:Updating service walkoff_resource_mongo (id: 4cs66cxnaobj46pkhej64phl0)
BOOTLOADER - INFO:Updating service walkoff_resource_portainer (id: wajd9mdfk7kp7ke8ukdv86ijm)
BOOTLOADER - INFO:Updating service walkoff_resource_redis (id: boksqrbjasaoy60atzvxqrrs1)
BOOTLOADER - INFO:Generated compose for basics version: 1.0.0
BOOTLOADER - INFO:Minio not available yet, waiting to try again...
BOOTLOADER - INFO:Minio not available yet, waiting to try again...
BOOTLOADER - INFO:Minio not available yet, waiting to try again...
...
UMPIRE - INFO:Docker connection closed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 967, in _create_direct_connection
    traces=traces), loop=self._loop)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 830, in _resolve_host
    self._resolver.resolve(host, port, family=self._family)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/resolver.py", line 30, in resolve
    host, port, type=socket.SOCK_STREAM, family=family)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 784, in getaddrinfo
    None, getaddr_func, host, port, family, type, proto, flags)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/tenacity/_asyncio.py", line 57, in call
    result = yield from fn(*args, **kwargs)
  File "/WALKOFF/bootloader/bootloader.py", line 377, in wait_for_minio
    raise e
  File "/WALKOFF/bootloader/bootloader.py", line 370, in wait_for_minio
    async with self.session.get(f"http://{config.MINIO}/minio/health/ready") as resp:
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 483, in _request
    timeout=real_timeout
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 523, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 859, in _create_connection
    req, traces, timeout)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 971, in _create_direct_connection
    raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host resource_minio:9000 ssl:default [Name or service not known]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/WALKOFF/bootloader/bootloader.py", line 572, in <module>
    asyncio.run(Bootloader.run())
  File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/WALKOFF/bootloader/bootloader.py", line 349, in run
    await getattr(bootloader, args.command)()
  File "/WALKOFF/bootloader/bootloader.py", line 486, in up
    await self.wait_for_minio()
  File "/usr/local/lib/python3.7/site-packages/tenacity/_asyncio.py", line 54, in call
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.7/site-packages/tenacity/__init__.py", line 351, in iter
    six.raise_from(retry_exc, fut.exception())
  File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0xb58523d0 state=finished raised ClientConnectorError>]

By the way, I also don't see any images or services with references to nginx:

$docker service ls

NAME                          MODE                REPLICAS            IMAGE                         PORTS
walkoff_debug_mongo_express   replicated          0/1                 mongo-express:latest          *:27018->8081/tcp
walkoff_resource_minio        replicated          0/1                 bitnami/minio:2019-debian-9   *:9001->9000/tcp
walkoff_resource_mongo        replicated          0/1                 mongo:4                       *:27016->27016/tcp
walkoff_resource_portainer    replicated          1/1                 portainer/portainer:latest
walkoff_resource_redis        replicated          0/1                 bitnami/redis:5.0             *:6379->6379/tcp
walkoff_resource_registry     replicated          1/1                 registry:2                    *:5000->5000/tcp

http://127.0.0.1:5000/v2/_catalog:

{"repositories":["walkoff_app_basics","walkoff_app_sdk","walkoff_core_api","walkoff_core_socketio","walkoff_core_umpire","walkoff_core_worker"]}

Images:

$docker image ls
REPOSITORY                             TAG                   IMAGE ID            SIZE
127.0.0.1:5000/walkoff_core_api        latest                0e00af62979d        181MB
127.0.0.1:5000/walkoff_core_umpire     latest                0ccb48e6e771        162MB
127.0.0.1:5000/walkoff_core_worker     latest                e30c29b64e36        141MB
127.0.0.1:5000/walkoff_app_basics      1.0.0                 581febb21fd0        210MB
127.0.0.1:5000/walkoff_app_sdk         latest                7bb318e27f3c        143MB
walkoff_bootloader                     latest                41f8ef1104a2        352MB
bitnami/redis                          5.0                   3283b07d6ca3        96.2MB
127.0.0.1:5000/walkoff_core_socketio   latest                3492805118bf        202MB
bitnami/minio                          2019-debian-9         c424ca3cce2f        157MB
portainer/portainer                    latest                8971979f760c        64.6MB
node                                   12.13.0-buster-slim   b415d1d3b3f2        129MB
python                                 3.7.4-slim-buster     9b229fa57716        126MB
registry                               2                     c99846f41d25        22.1MB
hburke123 commented 4 years ago

The reason that you don't see any services with references to nginx is due to WALKOFF failing to build at line 486 and never reaching the point of full deployment. It seems as though minio is taking an unusual amount of time to spin up. This may be due to your machine specs. How many times did you see the message "BOOTLOADER - INFO:Minio not available yet, waiting to try again..." print?

ch40s commented 4 years ago

@hburke123 the message Minio not available yet, waiting to try again... is shown 10 times. Would it help to increase the waiting time somehow?

hburke123 commented 4 years ago

Yes! We currently set the max wait time to 10 in the bootloader. You want to change this number, you can change line 367 in /bootloader/bootloader.py to a higher number. That should fix the issue; however, could you please inform me of the specs of your machine (i.e: memory, processor(s))

ch40s commented 4 years ago

I changed it from 10 to 30 and it still failed with the same error. CPU and memory utilization are not even close to max so I don't think it's related to my machine's specs. Any other ideas?

emrodas10 commented 4 years ago

Can you run this command and send me the output? "docker service ps walkoff_resource_minio --no-trunc" I can't replicate your issue unfortunately.

ch40s commented 4 years ago

@emrodas10 : I'm wondering if it has to do with the fact that I'm trying to install it on armv7l.

$ docker service ps walkoff_resource_minio --no-trunc
ID                          NAME                       IMAGE                                                                                                 NODE                DESIRED STATE       CURRENT STATE         ERROR                                                 PORTS
ivz7nw9zybp3mnuu627312pi2   walkoff_resource_minio.1   bitnami/minio:2019-debian-9@sha256:7fb42f1749ce9db87af5683b57f6adca32c853da3329ff3550c5b92b07fdfd83                       Running             Pending 4 hours ago   "no suitable node (unsupported platform on 1 node)"
emrodas10 commented 4 years ago

As you suspected, the minio image we use is built for x86. It is possible that if you use this image: "dimianstudio/minio-arm" instead (or any other minio arm image), it may fix your issue. Put that as the image in WALKOFF/bootloader/base-compose.yml at line 58. Hope this works!

ch40s commented 4 years ago

Thanks @emrodas10 and @hburke123 for your assistance!

I used the arm image for minio and as you can see below I don't get the same error for an unsupported platform, however the script is still waiting for minio to become available. I also included below some errors and warning that I found in system logs.

# docker service ps walkoff_resource_minio --no-trunc
ID                          NAME                           IMAGE                                                                                                   NODE   DESIRED STATE       CURRENT STATE             ERROR               PORTS
eoaegsjcyb1q7hniwoknzie6x   walkoff_resource_minio.1       dimianstudio/minio-arm:latest@sha256:d9c479b98b053129aa3c40e24406a4424c366a9123bbaefdb39a37ae38c42254           Ready               Ready 4 seconds ago
xsd9h7thytn5cccso07eqinqs    \_ walkoff_resource_minio.1   dimianstudio/minio-arm:latest@sha256:d9c479b98b053129aa3c40e24406a4424c366a9123bbaefdb39a37ae38c42254           Shutdown            Complete 4 seconds ago
2wa0jbl54sjqn63zwjn65gnea    \_ walkoff_resource_minio.1   dimianstudio/minio-arm:latest@sha256:d9c479b98b053129aa3c40e24406a4424c366a9123bbaefdb39a37ae38c42254           Shutdown            Complete 14 seconds ago
pllegew49qoqo8f8wvzbjl0ds    \_ walkoff_resource_minio.1   dimianstudio/minio-arm:latest@sha256:d9c479b98b053129aa3c40e24406a4424c366a9123bbaefdb39a37ae38c42254           Shutdown            Complete 23 seconds ago
v1da7prw8zouxyie6k70bpcu9    \_ walkoff_resource_minio.1   dimianstudio/minio-arm:latest@sha256:d9c479b98b053129aa3c40e24406a4424c366a9123bbaefdb39a37ae38c42254           Shutdown            Complete 33 seconds ago
# docker ps -a
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS                      PORTS               NAMES
e1a51e39ad79        dimianstudio/minio-arm:latest   "minio"                  4 seconds ago       Created                                         walkoff_resource_minio.1.0jat04638gs32fxdho52ys5sg
7718726cf58c        dimianstudio/minio-arm:latest   "minio"                  13 seconds ago      Exited (0) 6 seconds ago                        walkoff_resource_minio.1.x3kv41gvngjxv1r9yraususwd
6b6c2bc9c1d3        dimianstudio/minio-arm:latest   "minio"                  23 seconds ago      Exited (0) 16 seconds ago                       walkoff_resource_minio.1.kq7osos0elsnafrn4xug32xoh
84049d7e1a19        dimianstudio/minio-arm:latest   "minio"                  32 seconds ago      Exited (0) 25 seconds ago                       walkoff_resource_minio.1.c6dgrm22o8otmy182c5lfzrtw
d37c35af6990        dimianstudio/minio-arm:latest   "minio"                  42 seconds ago      Exited (0) 35 seconds ago                       walkoff_resource_minio.1.nvcwfgh4x8w4srlho2fq6069v

Repeating errors and warnings in syslog while docker is running:

localhost dockerd[]: time="..." level=error msg="fatal task error" error="task: non-zero exit (2)" module=node/agent/taskmanager node.id=wxs2ba95cjbfj7wyj0623f015 service.id=vvudvgo5rnh2zxqmgqlzjpolz task.id=7a4rtrlr4wrngqiwskpz96j1v
...
localhost kernel: [ 3115.260792] eth0: renamed from veth7989c29
localhost kernel: [ 3115.296643] br0: port 6(veth156) entered blocking state
localhost kernel: [ 3115.296652] br0: port 6(veth156) entered forwarding state
localhost kernel: [ 3115.406018] eth1: renamed from veth68a163d
localhost kernel: [ 3115.432425] br0: port 3(veth158) entered blocking state
localhost kernel: [ 3115.432433] br0: port 3(veth158) entered forwarding state
localhost kernel: [ 3115.460621] eth2: renamed from veth6f18bd5
localhost kernel: [ 3115.476671] IPv6: ADDRCONF(NETDEV_CHANGE): veth0c3bc3a: link becomes ready
localhost kernel: [ 3115.476756] docker_gwbridge: port 3(veth0c3bc3a) entered blocking state
localhost kernel: [ 3115.476762] docker_gwbridge: port 3(veth0c3bc3a) entered forwarding state
...
localhost dockerd... level=warning msg="32805b81f0c71c96... cleanup: failed to unmount IPC: umount /var/lib/docker/containers/32805b81f0c71c96e4785ca5c962.../mounts/shm, flags: 0x2: no such file or directory"
localhost dockerd... level=warning msg="69979b837ed10bb8... cleanup: failed to unmount IPC: umount /var/lib/docker/containers/69979b837ed10bb8c174c318756a.../mounts/shm, flags: 0x2: no such file or directory"
...
localhost dockerd... level=error msg="fatal task error" error="task: non-zero exit (2)" module=node/agent/taskmanager node.id=wxs2ba95cjbfj7wyj0623f015 service.id=vvudvgo5rnh2zxqmgqlzjpolz task.id=c92skwp38e26exo6c9v307ft4
localhost dockerd... level=error msg="fatal task error" error="task: non-zero exit (2)" module=node/agent/taskmanager node.id=wxs2ba95cjbfj7wyj0623f015 service.id=vvudvgo5rnh2zxqmgqlzjpolz task.id=k85uy01wahul74uyh1qd4wk0o
...
ch40s commented 4 years ago

Apparently some of these issues, including the one with minio, go away by picking the right arm image. The following are still failing (exit & restart every few seconds) for some reason though:

Current status:

BOOTLOADER - INFO:Deploying Walkoff stack...
BOOTLOADER - INFO:Creating service walkoff_app_ssh
BOOTLOADER - INFO:Creating service walkoff_core_api
BOOTLOADER - INFO:Creating service walkoff_core_socketio
BOOTLOADER - INFO:Creating service walkoff_core_umpire
BOOTLOADER - INFO:Creating service walkoff_core_worker
BOOTLOADER - INFO:Creating service walkoff_resource_nginx
BOOTLOADER - INFO:Creating service walkoff_app_basics
BOOTLOADER - INFO:Creating service walkoff_app_sdk
BOOTLOADER - INFO:Walkoff stack deployed, it may take a little time to converge.
Use 'docker stack services walkoff' to check on Walkoff services.
Web interface should be available at 'https://127.0.0.1:8080' once walkoff_resource_nginx is up.
UMPIRE - INFO:Docker connection closed.