logos-co / nomos-node

Nomos blockchain node
52 stars 18 forks source link

Executor behaviour connects to other executor behaviour for dispersal #900

Open romanzac opened 3 weeks ago

romanzac commented 3 weeks ago

Problem

Some executors connected to other executors with dispersal requests when address book included both executor and validator addresses. Dispersal attempts have failed by approximately 50% - about a half of executors haven't received dispersal success message. And the same amount of dispersal messages never reached validators.

Impact

Occurrence medium, impact high.

Expected behavior

Executors should not connect or at least should not send dispersal requests to other executors.

To reproduce

  1. Please checkout https://github.com/logos-co/nomos-node/commit/1121ffabdafe1058c62acfb91529ad18601b742c
  2. cargo clean; cargo build
  3. RUST_LOG=debug CONSENSUS_SLOT_TIME=5 RISC0_DEV_MODE=true cargo test -p nomos-da-network-core test_validation_behaviour -- --nocapture

Screenshots/logs

2024-10-31T02:37:32.180833Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.181661Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.181748Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.181781Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.181829Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.181859Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.271488Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.271625Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.283855Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.283928Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator timeout reached
2024-10-31T02:37:32.284586Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 20 messages from subnet 0
2024-10-31T02:37:32.284597Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284608Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284614Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 10 messages from subnet 1

2024-10-31T02:37:32.284621Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284627Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284633Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284638Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284645Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 10 messages from subnet 0
2024-10-31T02:37:32.284651Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284657Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284663Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284669Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284675Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284681Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284687Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284693Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284699Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284706Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 0
2024-10-31T02:37:32.284711Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Validator received 0 messages from subnet 1

2024-10-31T02:37:32.284727Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 10 messages dispersal success
2024-10-31T02:37:32.284738Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor terminated
2024-10-31T02:37:32.284794Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor terminated
2024-10-31T02:37:32.284839Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor terminated
2024-10-31T02:37:32.284891Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor terminated
2024-10-31T02:37:32.284931Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor terminated
2024-10-31T02:37:32.284970Z  WARN nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor terminated
2024-10-31T02:37:32.285421Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 0 messages dispersal success
2024-10-31T02:37:32.285431Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 0 messages dispersal success
2024-10-31T02:37:32.285438Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 10 messages dispersal success
2024-10-31T02:37:32.285445Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 0 messages dispersal success
2024-10-31T02:37:32.285452Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 0 messages dispersal success
2024-10-31T02:37:32.285459Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 10 messages dispersal success
2024-10-31T02:37:32.285466Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 0 messages dispersal success
2024-10-31T02:37:32.285472Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 10 messages dispersal success
2024-10-31T02:37:32.285478Z DEBUG nomos_da_network_core::protocols::dispersal::validator::behaviour::tests: Executor task received: 0 messages dispersal success
danielSanchezQ commented 3 weeks ago

What is the swarm stack of protocols for this? just to clarify:

romanzac commented 3 weeks ago

What is the swarm stack of protocols for this? just to clarify:

  • Executors may connect one to the others, but they will just not negotiate certain protocols (dispersal specifically).
  • Depending which protocols are mounted behaviour may change.

My swarms use default new() functions to initialize their behaviour. https://github.com/logos-co/nomos-node/blob/c5aebd31d145d922c092eec3a5aea459dc1c937c/nomos-da/network/core/src/protocols/dispersal/validator/behaviour.rs#L213

https://github.com/logos-co/nomos-node/blob/c5aebd31d145d922c092eec3a5aea459dc1c937c/nomos-da/network/core/src/protocols/dispersal/validator/behaviour.rs#L235

What I have noticed, DispersalValidatorBehaviour has a protocol selected during new() https://github.com/logos-co/nomos-node/blob/c5aebd31d145d922c092eec3a5aea459dc1c937c/nomos-da/network/core/src/protocols/dispersal/validator/behaviour.rs#L39

while DispersalExecutorBehaviour has no limit or selection visible: https://github.com/logos-co/nomos-node/blob/c5aebd31d145d922c092eec3a5aea459dc1c937c/nomos-da/network/core/src/protocols/dispersal/executor/behaviour.rs#L201

Perhaps executor acts more like a client and should actually reject connections. It looks it forwards things to libp2p::swarm::dummy::ConnectionHandler which I am not sure what it does ?

https://github.com/logos-co/nomos-node/blob/c5aebd31d145d922c092eec3a5aea459dc1c937c/nomos-da/network/core/src/protocols/dispersal/executor/behaviour.rs#L505