Closed fbarbu15 closed 7 months ago
Going through the logs i have following observations:
I find it odd that discovery takes close to 20 seconds in a test setup. Because i see peers immediately getting identified and getting added. But i also noticed that the discovered peers doesn't have relay shards as indicated in log below.
210 2023-12-22T12:21:32.825Z ^[[35mDEBUG^[[0m gowaku.node2.peer-manager peermanager/peer_manager.go:323 ENR doesn't have relay shards {"node": "16Uiu2HAmAsB1NRYA6wVeBQCbjbTncsaHnFJEkw46c5GL19X7VeVz", "peer": "16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip"}
Few ways this can be mitigated in test env is :
Would be good if you can share what is your test setup for this test.
I do use the filternode flag when starting the filter node. Regarding the REST API add peer endpoint, are there similar specs for gowaku so I can see how it works?
I've dug a little deeper and it would seem that this issue happens for the 1st requests after starting the nodes (notice in the test logs bellow that the http://127.0.0.1:10775/filter/v2/subscriptions
request times out after 60 seconds)
docker_logs.zip
Also you can see the node starting flags and tests setup in the test logs:
10:29:35 src.node.docker_mananger DEBUG Starting container with image harbor.status.im/wakuorg/go-waku:latest
10:29:35 src.node.docker_mananger DEBUG Using args ['--listen-address=0.0.0.0', '--rpc=true', '--rpc-admin=true', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=DEBUG', '--rest-relay-cache-capacity=100', '--websocket-port=37805', '--rpc-port=37803', '--rest-port=37802', '--tcp-port=37804', '--discv5-udp-port=37806', '--rpc-address=0.0.0.0', '--rest-address=0.0.0.0', '--nat=extip:172.18.1.197', '--peer-exchange=true', '--discv5-discovery=true', '--min-relay-peers-to-publish=0', '--legacy-filter=false', '--relay=true', '--filter=true', '--nodekey=30348dd51465150e04a5d9d932c72864c8967f806cce60b5d26afeca1e77eb68', '--cluster-id=0']
10:29:36 src.node.docker_mananger DEBUG Container started with ID a7ce04d67001. Setting up logs at ./log/docker/node1_2023-12-27_10-29-35__12885372-86c3-45c4-b0c2-f85e97889e3a__harbor.status.im_wakuorg_go-waku:latest.log
10:29:36 src.node.waku_node DEBUG Started container from image harbor.status.im/wakuorg/go-waku:latest. RPC: 37803 REST: 37802 WebSocket: 37805 TCP: 37804
10:29:36 src.libs.common DEBUG Sleeping for 1 seconds
10:29:37 src.node.api_clients.base_client INFO GET call: http://127.0.0.1:37802/debug/v1/info with payload: None
10:29:37 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'{"enrUri":"enr:-Ly4QOcl6Q09dTJ4jdWMK5pUngqtVBYkYk7jyGHBgnXTMtkGINZdK8p_6npqiPJ5VuHXUoUCdWN81OHRVR_dJ3JqDCuGAYyqZLbxgmlkgnY0gmlwhKwRAAKKbXVsdGlhZGRyc5gACgSsEQACBpOt3QMACgR_AAABBpOt3QOJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCk6yDdWRwgpOuhXdha3UyAQ","listenAddresses":["/ip4/127.0.0.1/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/127.0.0.1/tcp/37805/ws/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/172.17.0.2/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/172.17.0.2/tcp/37805/ws/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip"]}'
10:29:37 src.node.waku_node INFO REST service is ready !!
10:29:37 src.steps.filter DEBUG Running fixture setup: setup_main_filter_node
10:29:37 src.node.docker_mananger DEBUG Docker client initialized with image harbor.status.im/wakuorg/go-waku:latest
10:29:37 src.node.waku_node DEBUG WakuNode instance initialized with log path ./log/docker/node2_2023-12-27_10-29-35__12885372-86c3-45c4-b0c2-f85e97889e3a__harbor.status.im_wakuorg_go-waku:latest.log
10:29:37 src.node.waku_node DEBUG Starting Node...
10:29:37 src.node.docker_mananger DEBUG Attempting to create or retrieve network waku
10:29:37 src.node.docker_mananger DEBUG Network waku already exists
10:29:37 src.node.docker_mananger DEBUG Generated random external IP 172.18.218.251
10:29:37 src.node.docker_mananger DEBUG Generated ports [10775, 10776, 10777, 10778, 10779, 10780]
10:29:37 src.node.docker_mananger DEBUG Starting container with image harbor.status.im/wakuorg/go-waku:latest
10:29:37 src.node.docker_mananger DEBUG Using args ['--listen-address=0.0.0.0', '--rpc=true', '--rpc-admin=true', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=DEBUG', '--rest-relay-cache-capacity=100', '--websocket-port=10778', '--rpc-port=10776', '--rest-port=10775', '--tcp-port=10777', '--discv5-udp-port=10779', '--rpc-address=0.0.0.0', '--rest-address=0.0.0.0', '--nat=extip:172.18.218.251', '--peer-exchange=true', '--discv5-discovery=true', '--min-relay-peers-to-publish=1', '--legacy-filter=false', '--relay=false', '--filter=true', '--discv5-bootstrap-node=enr:-Ly4QOcl6Q09dTJ4jdWMK5pUngqtVBYkYk7jyGHBgnXTMtkGINZdK8p_6npqiPJ5VuHXUoUCdWN81OHRVR_dJ3JqDCuGAYyqZLbxgmlkgnY0gmlwhKwRAAKKbXVsdGlhZGRyc5gACgSsEQACBpOt3QMACgR_AAABBpOt3QOJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCk6yDdWRwgpOuhXdha3UyAQ', '--filternode=/ip4/172.18.1.197/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip', '--cluster-id=0']
10:29:38 src.node.docker_mananger DEBUG Container started with ID 2ad1d3e78997. Setting up logs at ./log/docker/node2_2023-12-27_10-29-35__12885372-86c3-45c4-b0c2-f85e97889e3a__harbor.status.im_wakuorg_go-waku:latest.log
10:29:38 src.node.waku_node DEBUG Started container from image harbor.status.im/wakuorg/go-waku:latest. RPC: 10776 REST: 10775 WebSocket: 10778 TCP: 10777
10:29:38 src.libs.common DEBUG Sleeping for 1 seconds
10:29:39 src.node.api_clients.base_client INFO GET call: http://127.0.0.1:10775/debug/v1/info with payload: None
10:29:39 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'{"enrUri":"enr:-Ly4QKM-6FyLNx_1Ajg7DqY-JTzE0dJnAldAUhd0DNDFnSPlU7FXcwK9UGusENw96-KPCw9rVVyOabBX6fVf9Tj_6yWGAYyqZL1-gmlkgnY0gmlwhKwRAAOKbXVsdGlhZGRyc5gACgSsEQADBioa3QMACgR_AAABBioa3QOJc2VjcDI1NmsxoQIdHACSBpmPFFiDDJp488NdYBuXloED178SFSJBRYdn14N0Y3CCKhmDdWRwgiobhXdha3UygA","listenAddresses":["/ip4/127.0.0.1/tcp/10777/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz","/ip4/127.0.0.1/tcp/10778/ws/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz","/ip4/172.17.0.3/tcp/10777/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz","/ip4/172.17.0.3/tcp/10778/ws/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz"]}'
10:29:39 src.node.waku_node INFO REST service is ready !!
------------------------------------------------------------------------------------------------ live log call ------------------------------------------------------------------------------------------------
10:29:39 tests.filter.test_subscribe_create DEBUG Running test with pubsub topic: /waku/2/rs/0/1
10:29:39 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:37802/relay/v1/subscriptions with payload: ["/waku/2/rs/0/1"]
10:29:39 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'true'
10:29:39 src.libs.common DEBUG Sleeping for 10 seconds
10:29:49 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:10775/filter/v2/subscriptions with payload: {"requestId": "9a2c0aeb-b4c5-4cf5-b1b5-d826c4ccf5fb", "contentFilters": ["/waku/2/rs/0/1"], "pubsubTopic": "/waku/2/rs/0/1"}
10:30:49 tests.filter.test_subscribe_create ERROR PubsubTopic /waku/2/rs/0/1 failed: HTTPConnectionPool(host='127.0.0.1', port=10775): Read timed out. (read timeout=60)
Regarding the REST API add peer endpoint, are there similar specs for gowaku so I can see how it works?
We don't have a similar page, but the yaml file for this can be found at https://github.com/waku-org/go-waku/blob/master/cmd/waku/server/rest/admin_api.yaml. You can refer to the details expected in request and response. You can import this file into postman to get sample req/resp structures
I do use the filternode flag when starting the filter node.
I am not talking about --filter
flag rather --filternode
flag.
--filter flag enables the node to act as as filter server/service node.
--filternode flag is to tell a filterClient which filterNode to connect to. This node need not act as a filter server again.
From the logs i don't see --filternode flag being specified, which is why i am guessing discovery could be causing the delay. Will go through the logs in detail to analyze what might be happening.
In the meantime it would be helpful to know your test setup for this case.
the 2nd node is using '--filternode=/ip4/172.18.1.197/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip'
in the logs above
Regarding the test setup it should be all in those logs. I'm starting 2 nodes (first relay + filter), 2nd just filter and connect them via filter protocol. After that the relay node is subscribing to this topic:
10:29:39 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:37802/relay/v1/subscriptions with payload: ["/waku/2/rs/0/1"]
10:29:39 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'true'
Then a 10 sec propagation sleep:
10:29:39 src.libs.common DEBUG Sleeping for 10 seconds
and the filter node tries to subscribe but the request times out after 60 seconds (the timeout I've set to the requests client)
10:29:49 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:10775/filter/v2/subscriptions with payload: {"requestId": "9a2c0aeb-b4c5-4cf5-b1b5-d826c4ccf5fb", "contentFilters": ["/waku/2/rs/0/1"], "pubsubTopic": "/waku/2/rs/0/1"}
10:30:49 tests.filter.test_subscribe_create ERROR PubsubTopic /waku/2/rs/0/1 failed: HTTPConnectionPool(host='127.0.0.1', port=10775): Read timed out. (read timeout=60)
the 2nd node is using
'--filternode=/ip4/172.18.1.197/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip'
in the logs above
Ah..somehow i missed it. Thanks
Can you rerun this by subscribing the nodes to pubsubTopic /ws/rs/2/0/1
via command-line? This will ensure the node won't subscribe to default named pubsubTopic.
I am wondering if this is falling into similar category as i have explained here
Just wanted to eliminate 1 more variable to narrow down scope of the potential issue.
I tried to test this but there's a regression that blocks go-waku tests: https://github.com/waku-org/go-waku/issues/988
Weekly Update
I tried starting the Relay node with pubsubTopic /waku/2/rs/0/1
and this fixes the problem, the filter/v2/subscriptions
to /waku/2/rs/0/1
is done in milliseconds.
So this flag is indeed responsible for this issue. For ex If instead of /waku/2/rs/0/1
I start the node with /waku/2/rs/0/2
the issue happens again, the filter/v2/subscriptions with /waku/2/rs/0/1
takes a lot
So this flag is indeed responsible for this issue. For ex If instead of
/waku/2/rs/0/1
I start the node with/waku/2/rs/0/2
the issue happens again, the filter/v2/subscriptions with/waku/2/rs/0/1
takes a lot
Does it timeout? If so that means the other node is not supporting that shard. Otherwise discovery should happen and it should still work. Can you confirm?
Yes, the first filter/v2/subscriptions
request will timeout. But after the timeout, if I retry, it will work immediately.
On the other hand, if the shard is not supported (for ex the subscription request is done on /waku/2/rs/1/0
) all requests will timeout, not just the 1st.
So the problem here is that the 1st request after node start will timeout
Yes, the first
filter/v2/subscriptions
request will timeout. But after the timeout, if I retry, it will work immediately. On the other hand, if the shard is not supported (for ex the subscription request is done on/waku/2/rs/1/0
) all requests will timeout, not just the 1st. So the problem here is that the 1st request after node start will timeout
That is interesting, do share logs where first request times-out for both the nodes.
To Reproduce
filter/v2/subscriptions
with a payload like{"requestId": "c712e7f5-a536-4228-8ecc-de395d8b429f", "contentFilters": ["/test/1/waku-filter/proto"], "pubsubTopic": "/waku/2/rs/0/1"}
Actual behavior Usually this takes ~10 seconds but many times it happens to go beyond 20 and eventually time out (because of the test timeout) and fail the test
go-waku version/commit hash
harbor.status.im/wakuorg/go-waku:lates
tAdditional context Added some logs when this request took a long time
timeout_after_20_seconds.log took_22_seconds.log