FIND_SUCCESSOR requests often time out

tsujio commented 10 years ago

Now a FIND_SUCCESSOR request is forwarded from N1 to N_n and the response is sent back from N_n to N1 as the following.

N1 [request] -> N2 -> ... -> N_n
N1 <- N2 <- ... <- [response] N_n

If N_i leaves network at any point after forwarding request to N_i+1 and before forwarding response to N_i-1, eventually the request times out.

So, for example, the following protocol seems to be better.

N1 [request] -> N2
N1 <- [redirect] N2
N1 [request] -> N3
N1 <- [redirect] N3
...
N1 [request] -> N_n
N1 <- [response] N_n

jure commented 10 years ago

I'm seeing this, yes.

Peer em45e5gjowc7syvi background.js:197
Peer vghcfv0odh88h0k9 background.js:197
Peer 853dbetyxbhuxr00 background.js:197
Peer 07t53m9g7i3blnmi background.js:197

Joining vghcfv0odh88h0k9 background.js:205
Failed: Error: FIND_SUCCESSOR request to vghcfv0odh88h0k9 timed out. background.js:179

Peer em45e5gjowc7syvi background.js:197
Peer 853dbetyxbhuxr00 background.js:197
Peer 07t53m9g7i3blnmi background.js:197

Joining 853dbetyxbhuxr00 background.js:205
Failed: Error: FIND_SUCCESSOR request to 853dbetyxbhuxr00 timed out. background.js:179

Peer em45e5gjowc7syvi background.js:197
Peer 07t53m9g7i3blnmi background.js:197
Joining 07t53m9g7i3blnmi 
Failed: Error: FIND_SUCCESSOR request to 07t53m9g7i3blnmi timed out.

This is on a fresh network with only 4 peers though, as the PeerJS server just restarted. So I don't know what's going on.

I agree that a redirect protocol seems more manageable, at least it's easier to determine where the error is for possible error correction.

jure commented 10 years ago

With the new iterative findSuccessor implementation, I'm getting an endless loop when connection to a peer is not possible on the first go:

PeerJS:  ERROR Error: Could not connect to peer 1mh0mjoa5bd6xbt9 peer.js:1117
Error {type: "peer-unavailable", stack: (...), message: "Could not connect to peer 1mh0mjoa5bd6xbt9"}
 webrtc-chord.js:2998

Looks like it gets stuck on this part:

else if (status === 'REDIRECT') {
          successor.findSuccessor(key, function(_successor, error) {
            if (error) {
              console.log(error);
              self._references.removeReference(successor);
              self.findSuccessor(key, callback);
              return;
            }

            callback(_successor);
          });

successor here is a different entity on two times, e.g.:

successor
Node {_peerId: "ey2bh4roqykqpvi0", nodeId: ID, _nodeFactory: NodeFactory, _connectionFactory: ConnectionFactory, _requestHandler: RequestHandler…}
Error {type: "peer-unavailable", stack: (...), message: "Could not connect to peer 1mh0mjoa5bd6xbt9"}
 webrtc-chord.js:1427
successor
Node {_peerId: "dwtvxid9s76tj4i0", nodeId: ID, _nodeFactory: NodeFactory, _connectionFactory: ConnectionFactory, _requestHandler: RequestHandler…}

But I haven't actually figured out what the issue is. There should probably be some kind of repetition/retry limit.

One more piece of information is that at this point it doesn't actually seem to be trying to connect to the peer anymore, i.e. I see no requests on the PeerJS server.

tsujio commented 10 years ago

A possible cause is that a peer, which has left network, is returned for individual findSuccessor attempts. I'll check more deeply.

tsujio commented 10 years ago

It can be caused in the following network topology.

peer1 --- peer2 --- peer3
            |        |
         peer5 ----- peer4

In the case a FIND_SUCCESSOR request is forwarded among peer2,3,4,5 infinitely. peer5 should notice that peer1 is the successor of it.

As you said, some retrying limits may be required.

tsujio / webrtc-chord

FIND_SUCCESSOR requests often time out #9