waku-org / nwaku

Waku node and protocol.
Other
199 stars 51 forks source link

bug: can't discover peers when using DNS Discovery URL + shards #2162

Closed richard-ramos closed 8 months ago

richard-ramos commented 10 months ago

The nodes returned by DNS DIscovery in shards.test don't have the information about the shards:

enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im

Nodes returned by this URL
1 - enr:-Ne4QJKpiQqwYpo0p1yDW6opKFYzh801nhSzX65S_x892UXABVYzFBrdFwCPiWwXlKqVz5sXkTzYtUuX1wg2sW5DZnwBgmlkgnY0gmlwhCIfDu-KbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQJm8YcPIYhI5rvlLJJRlpebApk6w4uOLdFgAeHN2wO9N4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
2 - enr:-Ne4QIvHiMe1Gf7h22jygL1kPFVAcQ0RkDYNk1PNA52KUKElBSPuPy-HSD1pRX-rCx2A2Qqh0GtkzFUyL8NQEiL15P0BgmlkgnY0gmlwhAjaF0yKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQM_sJtGT5gonA4UUzhn2d7LQY9ztY8loLAaSk1HKVruYIN0Y3CCdl-DdWRwgiMohXdha3UyDQ
3 - enr:-Ne4QHOpWLyVVZMzJwXcc00CNp16vB5x2WFy6WQAEKyaOf_UMWKvz2a0HN9QCoSyBYmudBKspqYa_U6tJ64B0TqLzy0BgmlkgnY0gmlwhAjarmyKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQNeQXcyqdYwEjflVdLKYAusuZJ93fpGiFwqK1jU9ISQC4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
4 - enr:-M24QJDZfhB_wN_PHOAQuzgnta20xKUsZl5kdhBeQJM16gdldCJNAKQp6dgbwo-MTRJxYVNCr85cHRAJxtNLR4vTbP0BgmlkgnY0gmlwhKdjEy-KbXVsdGlhZGRyc68ALTYoYm9vdC0wMS5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAt60bRUEoHNuLlnsM12sU2PIQwBwfLIJ8a_ZPEY2-Rnkg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
5 - enr:-M24QAsRRxoLDnnXFGnbHGUKjtqgXOVxb2Cian1vegc1rtY0Yk5wXDF7NeBzPl7frvyxo3Vt-xSL0vUa2jazchNIS_oBgmlkgnY0gmlwhLKAj_GKbXVsdGlhZGRyc68ALTYoYm9vdC0wMi5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAtsXOrELG9R5LlIbF6bqeLC0tg7bmNzQ0JkSmEO3zxqzg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
6 - enr:-Ne4QINS7SZiUk9oN3mcLpOrdQrFWS-AUDjyq5F9__8iTUT_H8ExnAj5qDWmG4qbLaz4NKvDtmIU3Ycu9sP_Ixk6hn4BgmlkgnY0gmlwhCKHDVeKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQLGOqANDRbJFI6KVhTfYMDmT9c2UOKzebVV1eQr3EzqQ4N0Y3CCdl-DdWRwgiMohXdha3UyDQ

These nodes are currently subscribed to shards:

However if you check the ENRs in https://enr-viewer.com/ these nodes lack the rs or rsv attributes. Also, if you compare the current ENR of the nodes (using the nodes' RPC server), against those from the DNS Discovery URL, you'd see that the latter are outdated compared to the former, which makes sense since the DNS Discovery URL was manually created when the fleet was setup, while the subscription to shards is something that happens 'dynamically'.

This is problematic because If i have a node subscribed to shards 32,64,128 and/or 256, if I use the dns discovery URL, I'll not be able to find new peers, because they'll get filtered out. DiscV5 will populate the routing tables with those shardless ENRs retrieved from the URL, and then they get filtered out, since Discv5 (at least in go-waku implementation, and looking at nwaku implementation seems to be similar) will not ask those nodes for their current ENR, and the filtering logic defined in this predicate: https://github.com/waku-org/nwaku/blob/3be6163639d4141a6ee51ae8f8e83635f541f783/waku/waku_discv5.nim#L53-L75 , meaning that the nodes will not be able to discover new peers using the dns discovery URL.

This is particularly problematic in the case of Status, since the DNS discovery URL is hardcoded in the node configuration.

richard-ramos commented 10 months ago

I tried the following in nwaku:

Node1:
./build/wakunode2 --discv5-discovery --dns-discovery --dns-discovery-url=enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im --pubsub-topic=/waku/2/rs/16/128 --tcp-port=55511

Node2:
./build/wakunode2 --discv5-discovery --dns-discovery --dns-discovery-url=enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im --pubsub-topic=/waku/2/rs/16/128 --tcp-port=55522

Do notice that the following log line gets printed when using this configuration

WRN 2023-10-26 11:01:43.551-04:00 No discv5 bootstrap nodes share this node configured shards topics="wakunode app" tid=2758528 file=waku_discv5.nim:96

Can confirm that nodes are not getting discovered, however if I remove the --pubsub-topic flag, I'm able to see the peers being discovered

richard-ramos commented 10 months ago

I created the following tool with go-waku to discover peers via discv5: https://github.com/waku-org/test-discv5/tree/master

You can use any of these commands to execute it:

go run main.go --dns-disc-url=enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im

or

go run main.go --bootnodes=comma_separated_list_of_enrs

This will continuously try to discover peers, and print the enr, ip, port, multiaddresses, rs and rsv if available. If something is not available it wont be printed. In the results you can see that neither of the nodes have an rs or rsv field being displayed:

Bootnodes:
1 - enr:-Ne4QJKpiQqwYpo0p1yDW6opKFYzh801nhSzX65S_x892UXABVYzFBrdFwCPiWwXlKqVz5sXkTzYtUuX1wg2sW5DZnwBgmlkgnY0gmlwhCIfDu-KbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQJm8YcPIYhI5rvlLJJRlpebApk6w4uOLdFgAeHN2wO9N4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
2 - enr:-Ne4QIvHiMe1Gf7h22jygL1kPFVAcQ0RkDYNk1PNA52KUKElBSPuPy-HSD1pRX-rCx2A2Qqh0GtkzFUyL8NQEiL15P0BgmlkgnY0gmlwhAjaF0yKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQM_sJtGT5gonA4UUzhn2d7LQY9ztY8loLAaSk1HKVruYIN0Y3CCdl-DdWRwgiMohXdha3UyDQ
3 - enr:-Ne4QHOpWLyVVZMzJwXcc00CNp16vB5x2WFy6WQAEKyaOf_UMWKvz2a0HN9QCoSyBYmudBKspqYa_U6tJ64B0TqLzy0BgmlkgnY0gmlwhAjarmyKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQNeQXcyqdYwEjflVdLKYAusuZJ93fpGiFwqK1jU9ISQC4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
4 - enr:-M24QJDZfhB_wN_PHOAQuzgnta20xKUsZl5kdhBeQJM16gdldCJNAKQp6dgbwo-MTRJxYVNCr85cHRAJxtNLR4vTbP0BgmlkgnY0gmlwhKdjEy-KbXVsdGlhZGRyc68ALTYoYm9vdC0wMS5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAt60bRUEoHNuLlnsM12sU2PIQwBwfLIJ8a_ZPEY2-Rnkg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
5 - enr:-M24QAsRRxoLDnnXFGnbHGUKjtqgXOVxb2Cian1vegc1rtY0Yk5wXDF7NeBzPl7frvyxo3Vt-xSL0vUa2jazchNIS_oBgmlkgnY0gmlwhLKAj_GKbXVsdGlhZGRyc68ALTYoYm9vdC0wMi5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAtsXOrELG9R5LlIbF6bqeLC0tg7bmNzQ0JkSmEO3zxqzg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
6 - enr:-Ne4QINS7SZiUk9oN3mcLpOrdQrFWS-AUDjyq5F9__8iTUT_H8ExnAj5qDWmG4qbLaz4NKvDtmIU3Ycu9sP_Ixk6hn4BgmlkgnY0gmlwhCKHDVeKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQLGOqANDRbJFI6KVhTfYMDmT9c2UOKzebVV1eQr3EzqQ4N0Y3CCdl-DdWRwgiMohXdha3UyDQ

Your node:
enr:-.......................

Discovered peers:
===============================================================================
1 - NEW - enr:-Ne4QINS7SZiUk9oN3mcLpOrdQrFWS-AUDjyq5F9__8iTUT_H8ExnAj5qDWmG4qbLaz4NKvDtmIU3Ycu9sP_Ixk6hn4BgmlkgnY0gmlwhCKHDVeKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQLGOqANDRbJFI6KVhTfYMDmT9c2UOKzebVV1eQr3EzqQ4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ
multiaddr [/ip4/34.135.13.87/tcp/30303/p2p/16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ /dns4/boot-01.gc-us-central1-a.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ]
ip 34.135.13.87:30303

2 - NEW - enr:-M24QJDZfhB_wN_PHOAQuzgnta20xKUsZl5kdhBeQJM16gdldCJNAKQp6dgbwo-MTRJxYVNCr85cHRAJxtNLR4vTbP0BgmlkgnY0gmlwhKdjEy-KbXVsdGlhZGRyc68ALTYoYm9vdC0wMS5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAt60bRUEoHNuLlnsM12sU2PIQwBwfLIJ8a_ZPEY2-Rnkg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
peerID 16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31
multiaddr [/ip4/167.99.19.47/tcp/30303/p2p/16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31 /dns4/boot-01.do-ams3.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31]
ip 167.99.19.47:30303

3 - NEW - enr:-Ne4QHOpWLyVVZMzJwXcc00CNp16vB5x2WFy6WQAEKyaOf_UMWKvz2a0HN9QCoSyBYmudBKspqYa_U6tJ64B0TqLzy0BgmlkgnY0gmlwhAjarmyKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQNeQXcyqdYwEjflVdLKYAusuZJ93fpGiFwqK1jU9ISQC4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAmJzva9cFZdiLEeaXC4rLTZGH8DmrTetPfpmngrcaaNhUN
multiaddr [/ip4/8.218.174.108/tcp/30303/p2p/16Uiu2HAmJzva9cFZdiLEeaXC4rLTZGH8DmrTetPfpmngrcaaNhUN /dns4/boot-02.ac-cn-hongkong-c.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmJzva9cFZdiLEeaXC4rLTZGH8DmrTetPfpmngrcaaNhUN]
ip 8.218.174.108:30303

4 - NEW - enr:-Ne4QJKpiQqwYpo0p1yDW6opKFYzh801nhSzX65S_x892UXABVYzFBrdFwCPiWwXlKqVz5sXkTzYtUuX1wg2sW5DZnwBgmlkgnY0gmlwhCIfDu-KbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQJm8YcPIYhI5rvlLJJRlpebApk6w4uOLdFgAeHN2wO9N4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAm2MXB1WzsGKnYrcX8GRSvunQ1riJmPzVZuvUphM1YE4pn
multiaddr [/ip4/34.31.14.239/tcp/30303/p2p/16Uiu2HAm2MXB1WzsGKnYrcX8GRSvunQ1riJmPzVZuvUphM1YE4pn /dns4/boot-02.gc-us-central1-a.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm2MXB1WzsGKnYrcX8GRSvunQ1riJmPzVZuvUphM1YE4pn]
ip 34.31.14.239:30303

5 - NEW - enr:-M24QAsRRxoLDnnXFGnbHGUKjtqgXOVxb2Cian1vegc1rtY0Yk5wXDF7NeBzPl7frvyxo3Vt-xSL0vUa2jazchNIS_oBgmlkgnY0gmlwhLKAj_GKbXVsdGlhZGRyc68ALTYoYm9vdC0wMi5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAtsXOrELG9R5LlIbF6bqeLC0tg7bmNzQ0JkSmEO3zxqzg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
peerID 16Uiu2HAmAAuoviraBqSBcR5eC346RK46SruiPKdFQBvWrFjXEkLr
multiaddr [/ip4/178.128.143.241/tcp/30303/p2p/16Uiu2HAmAAuoviraBqSBcR5eC346RK46SruiPKdFQBvWrFjXEkLr /dns4/boot-02.do-ams3.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmAAuoviraBqSBcR5eC346RK46SruiPKdFQBvWrFjXEkLr]
ip 178.128.143.241:30303

6 - NEW - enr:-Ne4QIvHiMe1Gf7h22jygL1kPFVAcQ0RkDYNk1PNA52KUKElBSPuPy-HSD1pRX-rCx2A2Qqh0GtkzFUyL8NQEiL15P0BgmlkgnY0gmlwhAjaF0yKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQM_sJtGT5gonA4UUzhn2d7LQY9ztY8loLAaSk1HKVruYIN0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAmGwcE8v7gmJNEWFtZtojYpPMTHy2jBLL6xRk33qgDxFWX
multiaddr [/ip4/8.218.23.76/tcp/30303/p2p/16Uiu2HAmGwcE8v7gmJNEWFtZtojYpPMTHy2jBLL6xRk33qgDxFWX /dns4/boot-01.ac-cn-hongkong-c.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmGwcE8v7gmJNEWFtZtojYpPMTHy2jBLL6xRk33qgDxFWX]
ip 8.218.23.76:30303
richard-ramos commented 10 months ago

Doing more experiments with nwaku, if I remove the following lines of code https://github.com/waku-org/nwaku/blob/3be6163639d4141a6ee51ae8f8e83635f541f783/waku/waku_discv5.nim#L90-L91 then even tho the nodes get added as bootnodes, this part of the code still filters them out, due to the ENR info being outdated wrt shards https://github.com/waku-org/nwaku/blob/master/waku/waku_discv5.nim#L248-L251

SionoiS commented 10 months ago

Looks like a Status problem not really Waku. Bootnodes without common shards should be filtered out. Same with nodes found through discv5.

jm-clius commented 10 months ago

I don't see this as priority for nwaku, as this is mostly relevant to Community clients relying on bootstrap nodes with ever-changing shard subscriptions - the go-waku fix will suffice for Status Communites. @chair28980 should we remove the epic label?

richard-ramos commented 10 months ago

While @chaitanyaprem and I were dogfooding the shards.test fleet, we noticed that only see a single store node being discovered, which was a strange behavior because this fleet has 6 store nodes according to https://fleets.status.im/ .

After doing some investigation on why this is happening, I found out that the store nodes are actually not discovered at all. In status-go we have something called the mailserver cycle, that automatically chooses a store node based on ping reply time, to connect to it and retrieve message history.

The reason why the store nodes are not being discovered via discV5 is because the dns discovery URL: https://github.com/status-im/infra-shards/blob/710444384b18f78e94eef62d8ac91b1322f6d333/ansible/group_vars/store.yml#L45C107-L46C25

After viewing the information of the nodes returned by this DNS discovery URL using https://github.com/waku-org/test-discv5, I saw that they enrs do not contain the shards we're interested into (for reasons explained in this issue), and while we fixed this in go-waku: https://github.com/waku-org/go-waku/blob/d7249fc123d3a27e3eb60b85d58d4d7a51df64a1/waku/v2/discv5/discover.go#L403-L405 and we can discover other go-waku peers, in nwaku the fix is not present (https://github.com/waku-org/nwaku/blob/b31c182325b380c1cbf2f0f6356736e9f7310996/waku/waku_discv5.nim#L67-L69).

I think this means that the store nodes and boot nodes are not connected to each other. I'm thinking we should add this to nwaku: https://github.com/waku-org/go-waku/blob/d7249fc123d3a27e3eb60b85d58d4d7a51df64a1/waku/v2/discv5/discover.go#L403-L405 to avoid this situation.