waku-org / nwaku

Waku node and protocol.
Other
203 stars 54 forks source link

bug: js-waku peer exchange interop bug #1428

Closed danisharora099 closed 1 year ago

danisharora099 commented 2 years ago

Problem

As a partial followup to https://github.com/waku-org/nwaku/issues/1408, the changes in js-waku to handle an inbound connection was handled upon which the connection was established and response was received by js-waku. Upon further confirmations, it is seen that the the received response when decoded results in:

  1. empty array
  2. different query than sent

Impact

Blocks Peer Exchange in js-waku

To reproduce

If you can reproduce the behavior, steps to reproduce:

  1. Go to js-waku
  2. Checkout to danisharora/peer-exchnage-refactor-protocol
  3. cd packages/tests && yarn test

Expected behavior

The ENRs of the two nwaku nodes connected should be returned as part of the response in an array.

Screenshots/logs

  1. Logs for nwaku: (Nwaku node 1 is enabled with Peer Exchange while Node 2 and Node 3 are connecting to Node 1 using discv5 discovery. nwaku_node3.log nwaku_node1.log -> running Peer Exchange nwaku_node2.log

  2. Inconsistency in request as seen before sending, and after the response:

    encoded req { query: { numPeers: 1n }, response: undefined }
    decoded-res { query: { numPeers: 0n }, response: { peerInfos: [] } }

    It can be seen that the numPeers requested were 1n, but the request in the received response has numPeers as 0n

nwaku version/commit hash

nwaku is running on the latest master

Additional context

kaiserd commented 2 years ago

Inconsistency in request as seen before sending, and after the response:

Do you send both requests to Node 1 within a short time window? In this case, you (currently) can't map them to the respective query because there is no request ID. This will be addressed in the peer-exchange overhaul, as we most likely move to a Waku req/resp style. (Note: This is not expected behaviour, because you could just ask for more peers in a single query.)

I can still spot ENR cache is empty in the nwaku node's log running PX, while it can also be seen that 2 peers are connected to it.

Do all peers have enough time before you send the request? There is a separate discv5 loop for peer exchange which runs less frequent than the main discv5 loop.

danisharora099 commented 2 years ago

Inconsistency in request as seen before sending, and after the response:

Do you send both requests to Node 1 within a short time window? In this case, you (currently) can't map them to the respective query because there is no request ID. This will be addressed in the peer-exchange overhaul, as we most likely move to a Waku req/resp style. (Note: This is not expected behaviour, because you could just ask for more peers in a single query.)

I start my node 1 with PX enabled, have the other two nwaku nodes connect to it immediately after with discv5Discovery. I then wait 40 seconds before creating a js-waku node and sending the peer exchange query. I have tried sending a request to fetch 1n peers & 2n peers, both of which result in an empty response array.

I can still spot ENR cache is empty in the nwaku node's log running PX, while it can also be seen that 2 peers are connected to it.

Do all peers have enough time before you send the request? There is a separate discv5 loop for peer exchange which runs less frequent than the main discv5 loop.

Yes. I wait for 40 seconds before requesting for the ENRs.

@kaiserd

kaiserd commented 2 years ago

In the logs you provided, node1 did not discover any nodes via discv5. So it seems, the problem is not related to peer-exchange (or at least, there is another problem shadowing the peer-exchange problem).

danisharora099 commented 2 years ago

In the logs you provided, node1 did not discover any nodes via discv5. So it seems, the problem is not related to peer-exchange (or at least, there is another problem shadowing the peer-exchange problem).

Not sure what it implies but tt does say peers=2, which I assumed are the 2 nodes it discovered.

kaiserd commented 2 years ago

Also

It can be seen that the numPeers requested were 1n, but the request in the received response has numPeers as 0n

This is expected, because the response does not have a numPeers field.

kaiserd commented 2 years ago

Not sure what it implies but tt does say peers=2, which I assumed are the 2 nodes it discovered.

These are gossipsub peers. Did you provide bootstrapnodes outside of discv5?

Edit: I checked the logs, and the js-waku test that creates the logs: It seems the js-waku full node accepts a relay dial; most likely, the listed relay peer is js-waku.

danisharora099 commented 2 years ago

I added a delay of 10 secs, as discussed, after each nwaku node runs to allow for discv5 to occur -- no luck on my end -- still receiving the empty ENR cache in the log.

fryorcraken commented 2 years ago

@kaiserd we are using --discv5-nootstrap-node when starting the nwaku node.

If:

  1. Should a peer exchange request on node 1 return node 2?
  2. Should a peer exchange request on node 2 return node1?
  3. If (1) or (2) is yes, then how long should it take for the peer to be included in the peer exchange request?
fryorcraken commented 2 years ago

@kaiserd Also, are the latest fixes deployed on wakuv2.test fleet? If so, @danisharora099 you can trying to target the test fleet for testing to confirm it works first, and then try to sort out the CI.

@richard-ramos if you do a peer exchange request with go-waku to the wakuv2. test fleet, does it return peers?

richard-ramos commented 2 years ago

I get a "protocol not supported" error. Is peer exchange enabled in the test fleet?

kaiserd commented 2 years ago

@fryorcraken

Should a peer exchange request on node 1 return node 2?

Yes. If node1 has peer-exchage=true

Should a peer exchange request on node 2 return node1?

Yes. If node2 has peer-exchage=true

If (1) or (2) is yes, then how long should it take for the peer to be included in the peer exchange request?

Waiting 40 seconds should make it highly lightly for the peer to be included. The peer exchange discovery loop runs every 30 seconds.

Note: This might change in future more stable versions of the RFC. We might exclude bootstrap nodes from being returned via peer-exchange.

kaiserd commented 2 years ago

@danisharora099

I added a delay of 10 secs, as discussed, after each nwaku node runs to allow for discv5 to occur -- no luck on my end -- still receiving the empty ENR cache in the log.

1051 shows a testcase that works on my machine.

Discovered px peers via discv5 .... count=2 ...

(There are strange characters in the js-waku logs for nwaku, so I copied and edited the relevant info.)

kaiserd commented 2 years ago

@fryorcraken @richard-ramos

are the latest fixes deployed on wakuv2.test fleet?

I get a "protocol not supported" error. Is peer exchange enabled in the test fleet?

Currently not. I'll ask.

kaiserd commented 2 years ago

Peer exchange will be activated on the wakuv2 test fleet soon (PRs under review):

danisharora099 commented 2 years ago

@danisharora099

I added a delay of 10 secs, as discussed, after each nwaku node runs to allow for discv5 to occur -- no luck on my end -- still receiving the empty ENR cache in the log.

1051 shows a testcase that works on my machine.

* Node1 finds both node2 and node3 via discv5

* Node1 finds both node2 and node3 in the peer exchange discv5 loop

Discovered px peers via discv5 .... count=2 ...

(There are strange characters in the js-waku logs for nwaku, so I copied and edited the relevant info.)

I think you meant https://github.com/waku-org/js-waku/pull/1051 😅

Also, can you explain what you mean by "works on my machine"? Do you receive the expected peer info on the JS logs? I still can't seem to reproduce the expected behavior on your branch

@kaiserd

kaiserd commented 1 year ago

@danisharora099 sorry, I missed your comment.

Yes, I meant waku-org/js-waku#1051 :sweat_smile:

what you mean by "works on my machine"

What I wrote under "works on my machine" :

  • Node1 finds both node2 and node3 via discv5
  • Node1 finds both node2 and node3 in the peer exchange discv5 loop

Discovered px peers via discv5 .... count=2 ... (There are strange characters in the js-waku logs for nwaku, so I copied and edited the relevant info.)

This is in the nwaku node logs generated by the nwaku instances started by the js-waku test.

So, what worked is regarding your comment my comment was an answer to :smile: :

still receiving the empty ENR cache in the log.

In my tests, this has been solved / works. The ENR cache contains two peers now (as expected).

Also, I received bytes in the onRequest handler: (not parsed or tested if these are valid).

private onRequest(streamData: IncomingStreamData): void {
    log("Receiving message push");
    try {
      pipe(streamData.stream, lp.decode(), async (source) => {
        for await (const bytes of source) {
          console.log(bytes); // -> this actually prints bytes
        }
      }).then(
        () => {
          log("Receiving pipe closed.");
        },
        (e) => {
          log("Error with receiving pipe", e);
        }
      );
    } catch (e) {
      log("Error decoding message", e);
    }
  }

I did not check the JS logs. The main issue in the OP and the following comments was that nwaku's peer exchange discv5 (and discv5 in general) did not work in the js-waku test. With my draft PR, this worked. Also, the dial back from nwaku to js-waku worked and data was received by the js-waku peer-exchange onRequest handler I added. I did not parse the received data, nor did I check it for validity. See code fragment above.

danisharora099 commented 1 year ago

This has now been resolved. Thanks all :)