waku-org / go-waku

Go implementation of Waku v2 protocol
https://waku.org/
Other
116 stars 43 forks source link

bug: filter/v2/subscriptions take a lot of time and even timeout sometimes #972

Closed fbarbu15 closed 7 months ago

fbarbu15 commented 9 months ago

To Reproduce

  1. Send a filter/v2/subscriptions with a payload like {"requestId": "c712e7f5-a536-4228-8ecc-de395d8b429f", "contentFilters": ["/test/1/waku-filter/proto"], "pubsubTopic": "/waku/2/rs/0/1"}

Actual behavior Usually this takes ~10 seconds but many times it happens to go beyond 20 and eventually time out (because of the test timeout) and fail the test

go-waku version/commit hash

harbor.status.im/wakuorg/go-waku:latest

Additional context Added some logs when this request took a long time

timeout_after_20_seconds.log took_22_seconds.log

chaitanyaprem commented 9 months ago

Going through the logs i have following observations:

I find it odd that discovery takes close to 20 seconds in a test setup. Because i see peers immediately getting identified and getting added. But i also noticed that the discovered peers doesn't have relay shards as indicated in log below.

210 2023-12-22T12:21:32.825Z ^[[35mDEBUG^[[0m gowaku.node2.peer-manager peermanager/peer_manager.go:323 ENR doesn't have relay shards {"node": "16Uiu2HAmAsB1NRYA6wVeBQCbjbTncsaHnFJEkw46c5GL19X7VeVz", "peer": "16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip"}

Few ways this can be mitigated in test env is :

Would be good if you can share what is your test setup for this test.

fbarbu15 commented 9 months ago

I do use the filternode flag when starting the filter node. Regarding the REST API add peer endpoint, are there similar specs for gowaku so I can see how it works?

I've dug a little deeper and it would seem that this issue happens for the 1st requests after starting the nodes (notice in the test logs bellow that the http://127.0.0.1:10775/filter/v2/subscriptions request times out after 60 seconds) docker_logs.zip Also you can see the node starting flags and tests setup in the test logs:

10:29:35 src.node.docker_mananger DEBUG Starting container with image harbor.status.im/wakuorg/go-waku:latest
10:29:35 src.node.docker_mananger DEBUG Using args ['--listen-address=0.0.0.0', '--rpc=true', '--rpc-admin=true', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=DEBUG', '--rest-relay-cache-capacity=100', '--websocket-port=37805', '--rpc-port=37803', '--rest-port=37802', '--tcp-port=37804', '--discv5-udp-port=37806', '--rpc-address=0.0.0.0', '--rest-address=0.0.0.0', '--nat=extip:172.18.1.197', '--peer-exchange=true', '--discv5-discovery=true', '--min-relay-peers-to-publish=0', '--legacy-filter=false', '--relay=true', '--filter=true', '--nodekey=30348dd51465150e04a5d9d932c72864c8967f806cce60b5d26afeca1e77eb68', '--cluster-id=0']
10:29:36 src.node.docker_mananger DEBUG Container started with ID a7ce04d67001. Setting up logs at ./log/docker/node1_2023-12-27_10-29-35__12885372-86c3-45c4-b0c2-f85e97889e3a__harbor.status.im_wakuorg_go-waku:latest.log
10:29:36 src.node.waku_node DEBUG Started container from image harbor.status.im/wakuorg/go-waku:latest. RPC: 37803 REST: 37802 WebSocket: 37805 TCP: 37804
10:29:36 src.libs.common DEBUG Sleeping for 1 seconds
10:29:37 src.node.api_clients.base_client INFO GET call: http://127.0.0.1:37802/debug/v1/info with payload: None
10:29:37 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'{"enrUri":"enr:-Ly4QOcl6Q09dTJ4jdWMK5pUngqtVBYkYk7jyGHBgnXTMtkGINZdK8p_6npqiPJ5VuHXUoUCdWN81OHRVR_dJ3JqDCuGAYyqZLbxgmlkgnY0gmlwhKwRAAKKbXVsdGlhZGRyc5gACgSsEQACBpOt3QMACgR_AAABBpOt3QOJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCk6yDdWRwgpOuhXdha3UyAQ","listenAddresses":["/ip4/127.0.0.1/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/127.0.0.1/tcp/37805/ws/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/172.17.0.2/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/172.17.0.2/tcp/37805/ws/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip"]}'
10:29:37 src.node.waku_node INFO REST service is ready !!
10:29:37 src.steps.filter DEBUG Running fixture setup: setup_main_filter_node
10:29:37 src.node.docker_mananger DEBUG Docker client initialized with image harbor.status.im/wakuorg/go-waku:latest
10:29:37 src.node.waku_node DEBUG WakuNode instance initialized with log path ./log/docker/node2_2023-12-27_10-29-35__12885372-86c3-45c4-b0c2-f85e97889e3a__harbor.status.im_wakuorg_go-waku:latest.log
10:29:37 src.node.waku_node DEBUG Starting Node...
10:29:37 src.node.docker_mananger DEBUG Attempting to create or retrieve network waku
10:29:37 src.node.docker_mananger DEBUG Network waku already exists
10:29:37 src.node.docker_mananger DEBUG Generated random external IP 172.18.218.251
10:29:37 src.node.docker_mananger DEBUG Generated ports [10775, 10776, 10777, 10778, 10779, 10780]
10:29:37 src.node.docker_mananger DEBUG Starting container with image harbor.status.im/wakuorg/go-waku:latest
10:29:37 src.node.docker_mananger DEBUG Using args ['--listen-address=0.0.0.0', '--rpc=true', '--rpc-admin=true', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=DEBUG', '--rest-relay-cache-capacity=100', '--websocket-port=10778', '--rpc-port=10776', '--rest-port=10775', '--tcp-port=10777', '--discv5-udp-port=10779', '--rpc-address=0.0.0.0', '--rest-address=0.0.0.0', '--nat=extip:172.18.218.251', '--peer-exchange=true', '--discv5-discovery=true', '--min-relay-peers-to-publish=1', '--legacy-filter=false', '--relay=false', '--filter=true', '--discv5-bootstrap-node=enr:-Ly4QOcl6Q09dTJ4jdWMK5pUngqtVBYkYk7jyGHBgnXTMtkGINZdK8p_6npqiPJ5VuHXUoUCdWN81OHRVR_dJ3JqDCuGAYyqZLbxgmlkgnY0gmlwhKwRAAKKbXVsdGlhZGRyc5gACgSsEQACBpOt3QMACgR_AAABBpOt3QOJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCk6yDdWRwgpOuhXdha3UyAQ', '--filternode=/ip4/172.18.1.197/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip', '--cluster-id=0']
10:29:38 src.node.docker_mananger DEBUG Container started with ID 2ad1d3e78997. Setting up logs at ./log/docker/node2_2023-12-27_10-29-35__12885372-86c3-45c4-b0c2-f85e97889e3a__harbor.status.im_wakuorg_go-waku:latest.log
10:29:38 src.node.waku_node DEBUG Started container from image harbor.status.im/wakuorg/go-waku:latest. RPC: 10776 REST: 10775 WebSocket: 10778 TCP: 10777
10:29:38 src.libs.common DEBUG Sleeping for 1 seconds
10:29:39 src.node.api_clients.base_client INFO GET call: http://127.0.0.1:10775/debug/v1/info with payload: None
10:29:39 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'{"enrUri":"enr:-Ly4QKM-6FyLNx_1Ajg7DqY-JTzE0dJnAldAUhd0DNDFnSPlU7FXcwK9UGusENw96-KPCw9rVVyOabBX6fVf9Tj_6yWGAYyqZL1-gmlkgnY0gmlwhKwRAAOKbXVsdGlhZGRyc5gACgSsEQADBioa3QMACgR_AAABBioa3QOJc2VjcDI1NmsxoQIdHACSBpmPFFiDDJp488NdYBuXloED178SFSJBRYdn14N0Y3CCKhmDdWRwgiobhXdha3UygA","listenAddresses":["/ip4/127.0.0.1/tcp/10777/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz","/ip4/127.0.0.1/tcp/10778/ws/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz","/ip4/172.17.0.3/tcp/10777/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz","/ip4/172.17.0.3/tcp/10778/ws/p2p/16Uiu2HAkwPJZvrMJPyvMi1hp1soevFLouJ15qyCuXc2kCQuCb7Yz"]}'
10:29:39 src.node.waku_node INFO REST service is ready !!
------------------------------------------------------------------------------------------------ live log call ------------------------------------------------------------------------------------------------
10:29:39 tests.filter.test_subscribe_create DEBUG Running test with pubsub topic: /waku/2/rs/0/1
10:29:39 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:37802/relay/v1/subscriptions with payload: ["/waku/2/rs/0/1"]
10:29:39 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'true'
10:29:39 src.libs.common DEBUG Sleeping for 10 seconds
10:29:49 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:10775/filter/v2/subscriptions with payload: {"requestId": "9a2c0aeb-b4c5-4cf5-b1b5-d826c4ccf5fb", "contentFilters": ["/waku/2/rs/0/1"], "pubsubTopic": "/waku/2/rs/0/1"}
10:30:49 tests.filter.test_subscribe_create ERROR PubsubTopic /waku/2/rs/0/1 failed: HTTPConnectionPool(host='127.0.0.1', port=10775): Read timed out. (read timeout=60)
chaitanyaprem commented 9 months ago

Regarding the REST API add peer endpoint, are there similar specs for gowaku so I can see how it works?

We don't have a similar page, but the yaml file for this can be found at https://github.com/waku-org/go-waku/blob/master/cmd/waku/server/rest/admin_api.yaml. You can refer to the details expected in request and response. You can import this file into postman to get sample req/resp structures

chaitanyaprem commented 9 months ago

I do use the filternode flag when starting the filter node.

I am not talking about --filterflag rather --filternode flag. --filter flag enables the node to act as as filter server/service node. --filternode flag is to tell a filterClient which filterNode to connect to. This node need not act as a filter server again.

From the logs i don't see --filternode flag being specified, which is why i am guessing discovery could be causing the delay. Will go through the logs in detail to analyze what might be happening.

In the meantime it would be helpful to know your test setup for this case.

fbarbu15 commented 9 months ago

the 2nd node is using '--filternode=/ip4/172.18.1.197/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip' in the logs above

fbarbu15 commented 9 months ago

Regarding the test setup it should be all in those logs. I'm starting 2 nodes (first relay + filter), 2nd just filter and connect them via filter protocol. After that the relay node is subscribing to this topic:

10:29:39 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:37802/relay/v1/subscriptions with payload: ["/waku/2/rs/0/1"]
10:29:39 src.node.api_clients.base_client INFO Response status code: 200. Response content: b'true'

Then a 10 sec propagation sleep: 10:29:39 src.libs.common DEBUG Sleeping for 10 seconds and the filter node tries to subscribe but the request times out after 60 seconds (the timeout I've set to the requests client)

10:29:49 src.node.api_clients.base_client INFO POST call: http://127.0.0.1:10775/filter/v2/subscriptions with payload: {"requestId": "9a2c0aeb-b4c5-4cf5-b1b5-d826c4ccf5fb", "contentFilters": ["/waku/2/rs/0/1"], "pubsubTopic": "/waku/2/rs/0/1"}
10:30:49 tests.filter.test_subscribe_create ERROR PubsubTopic /waku/2/rs/0/1 failed: HTTPConnectionPool(host='127.0.0.1', port=10775): Read timed out. (read timeout=60)
chaitanyaprem commented 9 months ago

the 2nd node is using '--filternode=/ip4/172.18.1.197/tcp/37804/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip' in the logs above

Ah..somehow i missed it. Thanks

chaitanyaprem commented 9 months ago

Can you rerun this by subscribing the nodes to pubsubTopic /ws/rs/2/0/1 via command-line? This will ensure the node won't subscribe to default named pubsubTopic. I am wondering if this is falling into similar category as i have explained here

Just wanted to eliminate 1 more variable to narrow down scope of the potential issue.

fbarbu15 commented 8 months ago

I tried to test this but there's a regression that blocks go-waku tests: https://github.com/waku-org/go-waku/issues/988

chaitanyaprem commented 8 months ago

Weekly Update

fbarbu15 commented 8 months ago

I tried starting the Relay node with pubsubTopic /waku/2/rs/0/1 and this fixes the problem, the filter/v2/subscriptions to /waku/2/rs/0/1 is done in milliseconds. So this flag is indeed responsible for this issue. For ex If instead of /waku/2/rs/0/1 I start the node with /waku/2/rs/0/2 the issue happens again, the filter/v2/subscriptions with /waku/2/rs/0/1 takes a lot

chaitanyaprem commented 8 months ago

So this flag is indeed responsible for this issue. For ex If instead of /waku/2/rs/0/1 I start the node with /waku/2/rs/0/2 the issue happens again, the filter/v2/subscriptions with /waku/2/rs/0/1 takes a lot

Does it timeout? If so that means the other node is not supporting that shard. Otherwise discovery should happen and it should still work. Can you confirm?

fbarbu15 commented 8 months ago

Yes, the first filter/v2/subscriptions request will timeout. But after the timeout, if I retry, it will work immediately. On the other hand, if the shard is not supported (for ex the subscription request is done on /waku/2/rs/1/0 ) all requests will timeout, not just the 1st. So the problem here is that the 1st request after node start will timeout

chaitanyaprem commented 8 months ago

Yes, the first filter/v2/subscriptions request will timeout. But after the timeout, if I retry, it will work immediately. On the other hand, if the shard is not supported (for ex the subscription request is done on /waku/2/rs/1/0 ) all requests will timeout, not just the 1st. So the problem here is that the 1st request after node start will timeout

That is interesting, do share logs where first request times-out for both the nodes.

fbarbu15 commented 8 months ago

nodes.zip First request is done at 11:12:35 and it times out after 50 seconds (the timeout setup in the test client). The 2nd one is done at 11:13:26 and it's succesfull