waku-org / go-waku

Go implementation of Waku v2 protocol
https://waku.org/
Other
122 stars 42 forks source link

bug: nim->gowaku interop lightpush and filter tests fail with `invalid shard count` error #1255

Open fbarbu15 opened 1 week ago

fbarbu15 commented 1 week ago

Describe the bug We have a regression with most of the light protocols tests fail, see https://waku-org.github.io/waku-interop-tests/go/364/# Regression because yesterday it didn't reproduce

To Reproduce Steps to reproduce the behavior:

  1. Have nwaku as a relay node
  2. Have go-waku as client node, trying to mount filter or lightpush
  3. Connect the nodes via admin/v1/peers API or just wait for autoconnection via discv5
  4. Send lighpush requests or try to do filter subscribe requests (depending on the protocol mounted)

Expected behavior Should work

Actual behavior We get either filter or lightpush error. See node log.txt

go-waku version/commit hash

wakuorg/go-waku:latest

Additional context Add any other context about the problem here.

Script to reproduce the issue locally

#!/bin/bash
printf "\nAssuming you already have a docker network called waku\n"
# if not something like this should create it: docker network create --driver bridge --subnet 172.18.0.0/16 --gateway 172.18.0.1 waku

cluster_id=2
pubsub_topic="/waku/2/rs/$cluster_id/0"
node_1=wakuorg/nwaku:latest
node_2=wakuorg/go-waku:latest
ext_ip="172.18.204.9"
tcp_port="37344"

printf "\nStarting containers\n"

container_id1=$(docker run -d -i -t -p 37343:37343 -p $tcp_port:$tcp_port -p 37345:37345 -p 37346:37346 -p 37347:37347 $node_1 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=TRACE --rest-relay-cache-capacity=100 --websocket-port=37345 --rest-port=37343 --tcp-port=$tcp_port --discv5-udp-port=37346 --rest-address=0.0.0.0 --nat=extip:$ext_ip --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --metrics-server=true --metrics-server-address=0.0.0.0 --metrics-server-port=37347 --metrics-logging=true --pubsub-topic=/waku/2/rs/2/0 --lightpush=true --relay=true)
docker network connect --ip $ext_ip waku $container_id1

printf "\nSleeping 2 seconds\n"
sleep 2

response=$(curl -X GET "http://127.0.0.1:37343/debug/v1/info" -H "accept: application/json")
enrUri=$(echo $response | jq -r '.enrUri')

# Extract the first non-WebSocket address
ws_address=$(echo $response | jq -r '.listenAddresses[] | select(contains("/ws") | not)')

# Check if we got an address, and construct the new address with it
if [[ $ws_address != "" ]]; then
    identifier=$(echo $ws_address | awk -F'/p2p/' '{print $2}')
    if [[ $identifier != "" ]]; then
        multiaddr_with_id="/ip4/${ext_ip}/tcp/${tcp_port}/p2p/${identifier}"
    else
        echo "No identifier found in the address."
        exit 1
    fi
else
    echo "No non-WebSocket address found."
    exit 1
fi

container_id2=$(docker run -d -i -t -p 25908:25908 -p 25909:25909 -p 25910:25910 -p 25911:25911 -p 25912:25912 $node_2 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=25910 --rest-port=25908 --tcp-port=25909 --discv5-udp-port=25911 --rest-address=0.0.0.0 --nat=extip:172.18.141.214 --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --min-relay-peers-to-publish=1 --rest-filter-cache-capacity=50 --pubsub-topic=/waku/2/rs/2/0 --lightpush=true --relay=false --discv5-bootstrap-node=$enrUri --lightpushnode=$multiaddr_with_id)

docker network connect --ip 172.18.141.214 waku $container_id2

printf "\nSleeping 2 seconds\n"
sleep 2

curl -v -X POST "http://127.0.0.1:25908/admin/v1/peers" -H "Content-Type: application/json" -d '{"multiaddr": "'"$multiaddr_with_id"'", "protocols": ["/vac/waku/relay/2.0.0"], "shards": [0, 1, 2, 3, 4, 5, 6, 7, 8]}'

printf "\nSubscribe\n"
curl -v -X POST "http://127.0.0.1:37343/relay/v1/subscriptions" -H "Content-Type: application/json" -d '["/waku/2/rs/2/0"]'

printf "\nSleeping 2 seconds\n"
sleep 2

printf "\nLightpush message on subscribed pubusub topic\n"                            
curl -v -X POST "http://127.0.0.1:25908/lightpush/v1/message" -H "Content-Type: application/json" -d '{"pubsubTopic": "/waku/2/rs/2/0", "message": {"payload": "TGlnaHQgcHVzaCB3b3JrcyEh", "contentTopic": "/myapp/1/latest/proto", "timestamp": 1712149720320589312}}'
fbarbu15 commented 6 hours ago

I still have issues if I run the above script. See log go_waku_log.txt

Also if I run the node like this:

docker run -i -t -p 35638:35638 -p 35639:35639 -p 35640:35640 -p 35641:35641 -p 35642:35642 wakuorg/go-waku:latest --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=35640 --rest-port=35638 --tcp-port=35639 --discv5-udp-port=35641 --rest-address=0.0.0.0 --nat=extip:172.18.84.98 --peer-exchange=true --discv5-discovery=true --cluster-id=3 --nodekey=e7a49628cc9bc72d99d76d6686d0da7accbbad49684b5a4adf4eca8a90fd8e6d --min-relay-peers-to-publish=1 --rest-filter-cache-capacity=50 --peer-store-capacity=10 --relay=false --discv5-bootstrap-node=enr:-L24QBKCRDeuDZPsynH3t4uLtpOVfb3qJZObnz7RXJsdj3nSXlIW0Q3nEXoW6aoe28qBQ-8swC7yl6TaW-jawitecwcCgmlkgnY0gmlwhKwS1dKKbXVsdGlhZGRyc5YACASsEtXSBp4RAAoErBLV0gaeEt0DgnJzhQADAQAAiXNlY3AyNTZrMaEDGGLo_x9gA79KL8fu4UTs5refA9uFt1QpoGV-0j69_H6DdGNwgp4Rg3VkcIKeE4V3YWt1MgU --filternode=/ip4/172.18.213.210/tcp/40465/p2p/16Uiu2HAmEJBcT3PS1AWkMaJecqQ1dJp9yxRvJQRgj6F1miHuWCT7

I see this warning 2024-11-26 11:47:18 2024-11-26T09:47:18.315ZWARNgowakuwaku/node.go:328could not set ENR shard info{"node": "16Uiu2HAm5TGos5SWyy7Qo7jkK39pFhKkondcE9Ud8YaJhGPQCZgR", "error": "invalid number of clusters found", "numClusters": 0}

Here's the full log go_waku_log2.txt

richard-ramos commented 1 hour ago

With the above script i'm getting this output from the script, which seems fine? lightpush errornot_published_to_any_peer as this is a valid response from nwaku when it has no relay peers. 🤔 Is this using the latest docker image from go-waku?

fbarbu15 commented 1 hour ago

With the above script i'm getting this output from the script, which seems fine? lightpush errornot_published_to_any_peer as this is a valid response from nwaku when it has no relay peers. 🤔 Is this using the latest docker image from go-waku?

Yes, my bad. I'm using the correct latest docker image. You are correct that the shell script fails because nwaku has no relay peers. However the interop tests are using multiple nwaku nodes so this was not the root cause of the invalid number of clusters found error.

The root cause of the issue is that now, go-waku expects the --pubsub-topic flag to be passed when starting a node that doesn't mount relay. Is this expected? Tests worked without this flag before I created this issue.

If it's a mandatory flag now, I can update the tests. Thanks