hyperledger / aries-agent-test-harness

Aries agent test framework, with agent backchannel support
https://aries-interop.info
Apache License 2.0
60 stars 65 forks source link

`./manage run` failing to start von network #777

Closed Patrik-Stas closed 6 months ago

Patrik-Stas commented 6 months ago

Hi, I am trying to run AATH in docker but I am running into an issue in early stages of setup.

Running on latest main 5aecaaee3924fa0f6d0707c30195f9f663078aed I run

/manage run -d aries-vcx

Which prints

Agents to be used:
  Acme - aries-vcx
  Bob - aries-vcx
  Faber - aries-vcx
  Mallory - aries-vcx

starting local von-network...
Creating von_node3_1     ... done
Creating von_webserver_1 ... done
Creating von_node1_1     ... done
Creating von_node2_1     ... done
Creating von_node4_1     ... done
Want to see the scrolling container logs? Run "./manage logs"
waiting for ledger to start/respond............................................................
The Indy Ledger failed to start within 60 seconds.

The Indy Ledger is not running.

Looking at containers I see

> docker ps -a

CONTAINER ID   IMAGE              COMMAND                  CREATED              STATUS                          PORTS                                                           NAMES
c7b42bd3410a   von-network-base   "./scripts/start_nod…"   About a minute ago   Up About a minute               0.0.0.0:9707-9708->9707-9708/tcp, :::9707-9708->9707-9708/tcp   von_node4_1
cf91c3ba178c   von-network-base   "./scripts/start_nod…"   About a minute ago   Up About a minute               0.0.0.0:9703-9704->9703-9704/tcp, :::9703-9704->9703-9704/tcp   von_node2_1
c9be5e371dad   von-network-base   "./scripts/start_nod…"   About a minute ago   Up About a minute               0.0.0.0:9701-9702->9701-9702/tcp, :::9701-9702->9701-9702/tcp   von_node1_1
6751c005a363   von-network-base   "./scripts/start_nod…"   About a minute ago   Up About a minute               0.0.0.0:9705-9706->9705-9706/tcp, :::9705-9706->9705-9706/tcp   von_node3_1
463f05141e77   von-network-base   "bash -c 'sleep 10 &…"   About a minute ago   Exited (1) About a minute ago

And inspecting logs:

> docker logs -f von_webserver_1

Traceback (most recent call last):
  File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/indy/server/server.py", line 14, in <module>
    from .anchor import (
  File "/home/indy/server/anchor.py", line 15, in <module>
    import nacl.signing
ModuleNotFoundError: No module named 'nacl'

The last logs from von_node1_1 looks quite normal:

> docker logs -f von_node1_1

2024-02-22 03:59:45,544|INFO|node.py|Node1 caught up to 0 txns in the last catchup
2024-02-22 03:59:45,545|INFO|node_reg_handler.py|Loaded current node registry from the ledger: ['Node1', 'Node2', 'Node3', 'Node4']
2024-02-22 03:59:45,546|INFO|node_reg_handler.py|Current committed node registry for previous views: [(0, ['Node1', 'Node2', 'Node3', 'Node4'])]
2024-02-22 03:59:45,546|INFO|node_reg_handler.py|Current uncommitted node registry for previous views: [(0, ['Node1', 'Node2', 'Node3', 'Node4'])]
2024-02-22 03:59:45,546|INFO|node_reg_handler.py|Current active node registry: ['Node1', 'Node2', 'Node3', 'Node4']
2024-02-22 03:59:45,549|INFO|checkpoint_service.py|Node1:0 - checkpoint_service adding a stable checkpoint CHECKPOINT{'instId': 0, 'viewNo': 0, 'seqNoStart': 0, 'seqNoEnd': 1500, 'digest': '41aRPooiqdjLuuEtThtr26au3zNDJXaFsu4CWt5JTvtK'}
2024-02-22 03:59:45,550|INFO|checkpoint_service.py|Node1:0 - checkpoint_service set watermarks as 1500 1800
2024-02-22 03:59:45,550|INFO|ordering_service.py|Node1:0 cleaning up till (0, 1500)
2024-02-22 03:59:45,550|INFO|checkpoint_service.py|Node1:0 - checkpoint_service marked stable checkpoint 1500
2024-02-22 03:59:45,551|INFO|ordering_service.py|Node1:0 set last ordered as (0, 1536)
2024-02-22 03:59:45,552|INFO|checkpoint_service.py|Node1:1 - checkpoint_service removing all received checkpoints
2024-02-22 03:59:45,553|INFO|checkpoint_service.py|Node1:1 - checkpoint_service set watermarks as 0 9223372036854775807
2024-02-22 03:59:45,553|INFO|node.py|CATCH-UP: Node1 caught up till (0, 1536)
2024-02-22 03:59:45,553|INFO|node.py|CATCH-UP: Node1 does not need any more catchups
2024-02-22 03:59:45,554|INFO|replica.py|Node1:0 setting primaryName for view no 0 to: None
2024-02-22 03:59:45,554|INFO|replica.py|Node1:1 setting primaryName for view no 0 to: None
2024-02-22 03:59:45,556|INFO|node.py|Node1 updated its pool parameters: f 1, totalNodes 4, allNodeNames {'Node3', 'Node2', 'Node4', 'Node1'}, requiredNumberOfInstances 2, minimumNodes 3, quorums {'n': 4, 'f': 1, 'weak': Quorum(2), 'strong': Quorum(3), 'propagate': Quorum(2), 'prepare': Quorum(2), 'commit': Quorum(3), 'reply': Quorum(2), 'view_change': Quorum(3), 'election': Quorum(3), 'view_change_ack': Quorum(2), 'view_change_done': Quorum(3), 'same_consistency_proof': Quorum(2), 'consistency_proof': Quorum(2), 'ledger_status': Quorum(2), 'ledger_status_last_3PC': Quorum(2), 'checkpoint': Quorum(2), 'timestamp': Quorum(2), 'bls_signatures': Quorum(3), 'observer_data': Quorum(2), 'backup_instance_faulty': Quorum(2)}
2024-02-22 03:59:45,557|INFO|consensus_shared_data.py|Node1:0 updated validators list to {'Node3', 'Node2', 'Node4', 'Node1'}
2024-02-22 03:59:45,557|INFO|consensus_shared_data.py|Node1:1 updated validators list to {'Node3', 'Node2', 'Node4', 'Node1'}
2024-02-22 03:59:45,558|INFO|replica.py|Node1:0 setting primaryName for view no 0 to: Node1:0
2024-02-22 03:59:45,558|NOTIFICATION|primary_connection_monitor_service.py|Node1:0 restored connection to primary of master
2024-02-22 03:59:45,558|NOTIFICATION|node.py|PRIMARY SELECTION:  selected primary Node1:0 for instance 0 (view 0)
2024-02-22 03:59:45,559|INFO|replica.py|Node1:1 setting primaryName for view no 0 to: Node2:1
2024-02-22 03:59:45,559|NOTIFICATION|node.py|PRIMARY SELECTION:  selected primary Node2:1 for instance 1 (view 0)
2024-02-22 03:59:45,559|INFO|node.py|Node1 started participating
2024-02-22 03:59:45,560|INFO|checkpoint_service.py|update_watermark_from_3pc to (0, 1536)
2024-02-22 03:59:45,560|INFO|checkpoint_service.py|Node1:0 - checkpoint_service set watermarks as 1536 1836
2024-02-22 03:59:45,560|INFO|checkpoint_service.py|Node1:1 - checkpoint_service set watermarks as 0 9223372036854775807
2024-02-22 03:59:45,561|INFO|last_sent_pp_store_helper.py|Node1 trying to restore lastPrePrepareSeqNo
2024-02-22 03:59:45,561|INFO|last_sent_pp_store_helper.py|Node1 did not find stored lastSentPrePrepare
2024-02-22 03:59:45,561|INFO|last_sent_pp_store_helper.py|Node1 erasing stored lastSentPrePrepare
2024-02-22 03:59:45,562|INFO|zstack.py|Processing messages from previously unknown remotes.
2024-02-22 03:59:45,563|INFO|seeder_service.py|Node1 received ledger status: LEDGER_STATUS{'ledgerId': 3, 'txnSeqNo': 1536, 'viewNo': None, 'ppSeqNo': None, 'merkleRoot': 'CJJMYw3q7KZnBU3UhSzkxs4ay4tiCQH38cawNa6AmFfq', 'protocolVersion': 2} from Node4
2024-02-22 03:59:45,563|INFO|cons_proof_service.py|Node1:ConsProofService:3 ignoring LEDGER_STATUS{'ledgerId': 3, 'txnSeqNo': 1536, 'viewNo': None, 'ppSeqNo': None, 'merkleRoot': 'CJJMYw3q7KZnBU3UhSzkxs4ay4tiCQH38cawNa6AmFfq', 'protocolVersion': 2} since it is not gathering ledger statuses
2024-02-22 04:00:45,512|INFO|primary_connection_monitor_service.py|Node1:1 The primary is already connected so view change will not be proposed

I am running this on linux. Any ideas what might be going on? It looks like internal issue of von_network, but on the other hand - this stuff is being used in CI and works, ain't it 🤔

Patrik-Stas commented 6 months ago

Well it seems that the von image which happened to be on that machine was outdated. Since von-network project by default relies on local registry, it simply found existing imagine and used that, despite being out of date.

Rebuilding the von-network image solved the issue.