openwallet-foundation / owl-agent-test-harness

Aries agent test framework, with agent backchannel support
https://aries-interop.info
Apache License 2.0
60 stars 66 forks source link

Agent(s) fail to start within 30 seconds #840

Closed JamesKEbert closed 5 months ago

JamesKEbert commented 5 months ago

Unfortunately, when trying to execute the runsets for aries-vcx or acapy, the agents fail to start (while testing #833). The following is what I am attempting on a linux machine: ./manage runset ariesvcx-ariesvcx -b

This fails with:
```bash Running runset: ariesvcx-ariesvcx, using agents: -d aries-vcx Scope of tests: -t @RFC0036,@RFC0037,@RFC0160,@RFC0023,@revocation -t ~@T002-HIPE0011 -t ~@T014-HIPE001 -t ~@wip -t ~@RFC0183 -t ~@DIDExchangeConnection -t ~@RFC0211 -t ~@RFC0453 -t ~@RFC0025 -t ~@RFC0434 -t ~@QualifiedDIDs Agents to build: aries-vcx Backchannel Folder: aries-backchannels/aries-vcx Building aries-vcx-agent-backchannel ... [+] Building 0.8s (5/5) FINISHED docker:desktop-linux => [internal] load build definition from Dockerfile.aries-vcx 0.0s => => transferring dockerfile: 129B 0.0s => [internal] load metadata for ghcr.io/hyperledger/aries-vcx/aath-backchannel:0.64.0-gmaath-064-update-7597 0.4s => [internal] load .dockerignore 0.1s => => transferring context: 269B 0.0s => CACHED [1/1] FROM ghcr.io/hyperledger/aries-vcx/aath-backchannel:0.64.0-gmaath-064-update-7597@sha256:2556044b4c09aae2d6a4ecd0034e23e2e970 0.0s => exporting to image 0.1s => => exporting layers 0.0s => => writing image sha256:6ea7b74bb8d582b090cf2de5f44ff71ffabc625eb83e62bb9632a7e1ca9de1ba 0.0s => => naming to docker.io/library/aries-vcx-agent-backchannel 0.0s Build multi-platform images faster with Docker Build Cloud: https://docs.docker.com/go/docker-build-cloud Building aries-test-harness ... [+] Building 5.7s (11/11) FINISHED docker:desktop-linux => [internal] load build definition from Dockerfile.harness 0.0s => => transferring dockerfile: 349B 0.0s => [internal] load metadata for docker.io/library/python:3.7-buster 5.5s => [internal] load .dockerignore 0.0s => => transferring context: 148B 0.0s => [1/6] FROM docker.io/library/python:3.7-buster@sha256:2539f956bcccbac5e4a48ebdafbbbfbd26b4ab56e65b96076ae9cd1188b119b3 0.0s => [internal] load build context 0.0s => => transferring context: 6.42kB 0.0s => CACHED [2/6] COPY ./requirements.txt /aries-test-harness/ 0.0s => CACHED [3/6] WORKDIR /aries-test-harness 0.0s => CACHED [4/6] RUN pip install -r requirements.txt 0.0s => CACHED [5/6] RUN echo "==> Install stuff not in the requirements..." && pip install --no-cache-dir aiohttp 0.0s => CACHED [6/6] COPY . /aries-test-harness 0.0s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:0870116332cb9c53943f71b017ceff1c350bd875da96d97b93421a8287a2fdb1 0.0s => => naming to docker.io/library/aries-test-harness 0.0s Build multi-platform images faster with Docker Build Cloud: https://docs.docker.com/go/docker-build-cloud Agents to be used: Acme - aries-vcx Bob - aries-vcx Faber - aries-vcx Mallory - aries-vcx starting local von-network... [+] Running 14/14 ✔ Network von_von Created 0.1s ✔ Volume "von_node2-data" Created 0.0s ✔ Volume "von_node3-data" Created 0.0s ✔ Volume "von_node4-data" Created 0.0s ✔ Volume "von_client-data" Created 0.0s ✔ Volume "von_webserver-cli" Created 0.0s ✔ Volume "von_webserver-ledger" Created 0.0s ✔ Volume "von_nodes-data" Created 0.0s ✔ Volume "von_node1-data" Created 0.0s ✔ Container von-node3-1 Started 0.4s ✔ Container von-node4-1 Started 0.4s ✔ Container von-webserver-1 Started 0.4s ✔ Container von-node1-1 Started 0.4s ✔ Container von-node2-1 Started 0.4s Want to see the scrolling container logs? Run "./manage logs" waiting for ledger to start/respond............ starting local uniresolver... Using: docker --log-level error compose % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3091 100 3091 0 0 967k 0 --:--:-- --:--:-- --:--:-- 1006k [+] Running 2/2 ✔ Container uni-resolver-web.local Started 0.1s ✔ Container driver-did-sov.local Started 0.2s starting local indy-tails-server... Using: docker --log-level error compose [+] Running 3/3 ✔ Network docker_tails-server Created 0.1s ✔ Container docker-ngrok-tails-server-1 Started 0.3s ✔ Container docker-tails-server-1 Started 0.3s waiting for tails server to start. waiting for uniresolver to start..... Starting Acme Agent using aries-vcx-agent-backchannel ... Starting Bob Agent using aries-vcx-agent-backchannel ... Starting Faber Agent using aries-vcx-agent-backchannel ... Starting Mallory Agent using aries-vcx-agent-backchannel ... waiting for Acme agent to start.............................. The agent failed to start within 30 seconds. waiting for Bob agent to start............................... The agent failed to start within 30 seconds. waiting for Faber agent to start.............................. The agent failed to start within 30 seconds. waiting for Mallory agent to start.............................. The agent failed to start within 30 seconds. ```

The logs when inspecting acme_agent for Aries VCX:

thread 'main' panicked at aries/agents/aath-backchannel/src/setup.rs:38:14:
Failed to send message: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv4(192.168.65.9)), port: Some(9000), path: "/register", query: None, fragment: None }, source: hyper_util::client::legacy::Error(Connect, ConnectError("tcp connect error", Os { code: 110, kind: TimedOut, message: "Operation timed out" })) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The logs when inspecting acme_agent for ACA-Py:
```bash Starting aca-py agent ... Error loading genesis transactions: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1025, in _wrap_create_connection return await self._loop.create_connection(*args, **kwargs) File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1065, in create_connection raise exceptions[0] File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1050, in create_connection sock = await self._connect_sock( File "/usr/local/lib/python3.9/asyncio/base_events.py", line 961, in _connect_sock await self.sock_connect(sock, address) File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 500, in sock_connect return await fut File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 535, in _sock_connect_cb raise OSError(err, f'Connect call failed {address}') TimeoutError: [Errno 110] Connect call failed ('192.168.65.9', 9000) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/aries-backchannels/python/agent_backchannel.py", line 73, in default_genesis_txns async with session.get(f"{ledger_url}/genesis") as resp: File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 1197, in __aenter__ self._resp = await self._coro File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 581, in _request conn = await self._connector.connect( File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 544, in connect proto = await self._create_connection(req, traces, timeout) File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 944, in _create_connection _, proto = await self._create_direct_connection(req, traces, timeout) File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1257, in _create_direct_connection raise last_exc File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1226, in _create_direct_connection transp, proto = await self._wrap_create_connection( File "/usr/local/lib/python3.9/site-packages/aiohttp/connector.py", line 1033, in _wrap_create_connection raise client_error(req.connection_key, exc) from exc aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 192.168.65.9:9000 ssl:default [Connect call failed ('192.168.65.9', 9000)] Error retrieving ledger genesis transactions ```

This to me smells of docker networking issues (as the agents fail to start when reaching out to the local von network from the looks of it). Any ideas?

JamesKEbert commented 5 months ago

It was a docker networking issue. By adding DOCKERHOST=host.docker.internal before the command everything worked as expected. Thanks for the suggestion @gmulhearn!