libp2p / js-libp2p

The JavaScript Implementation of libp2p networking stack.
https://libp2p.github.io/js-libp2p/
Other
2.33k stars 446 forks source link

node connected to itself + contentrouting timeout on peer:connect + lost pubsub messages after start/stop node #731

Closed olivier-nerot closed 1 year ago

olivier-nerot commented 4 years ago

Hello,

With the code below, with last libp2p version (0.28.10), I have a few surprising behaviors, by running it on two nodes into the same local network :

  1. the peer:connectis received by the emitting node, several times. i.e the node seems to connect to itself

    peer connected QmctEy46xNDL4JBAaPnnhCQtgtF4hMMjLhPqTvADrwfnet +118ms
    peer connected QmctEy46xNDL4JBAaPnnhCQtgtF4hMMjLhPqTvADrwfnet +3ms
    peer connected QmctEy46xNDL4JBAaPnnhCQtgtF4hMMjLhPqTvADrwfnet +3ms
    peer connected QmNN4tm2Z2T2LVzVLs3i2tKvj5nJRGCAP3Jkqfz9J2STCh +63ms
  2. the state.node.contentRouting.get(peerId._id) promise in peer:discoverevent throws an error, telling there is no peer available. It sounds surprising, as if a peer is discovered, it should mean there are peers available.

    Uncaught (in promise) Error: Failed to lookup key! No peers from routing table!
    at Object.getMany (index.js:218)
  3. the state.node.contentRouting.get(connection.remotePeer._id) promise in peer:connect often sends an timeout error,

    Uncaught (in promise) TimeoutError: Promise timed out after 60000 milliseconds
    at http://localhost:8080/js/chunk-vendors.js:124565:63
  4. if I do a node.stop() on node 1, node 2 receives well the peer:disconnect event, but also a peer:connectright right after, with the same timeout as in 3 :

    peer disconnected ClassIsWrapper +6s
    peer connected QmNN4tm2Z2T2LVzVLs3i2tKvj5nJRGCAP3Jkqfz9J2STCh +4s
    Uncaught (in promise) TimeoutError: Promise timed out after 60000 milliseconds
    at http://localhost:8080/js/chunk-vendors.js:124565:63
  5. if I do a disconnect() + connect() on node 1, the only event received afterwards by node 1 is peer:discover without peer:connect, and even if a pubsub.suscribe() is called in connect(), the messages sent by node 2 are not received anymore by node 1. Messages sent from node 1 are nevertheless well received by node 2.

It globally works, as pubsub messages can be sent and received, but there seems to be dysfunctions (or things I misunderstood) between peer events, node stop and start, and contentrouting get and put.

The code is coming from several libp2p examples.

Thanks for any clue to understand it better.

import Libp2p from 'libp2p';
import TCP from 'libp2p-tcp';
import Mplex from 'libp2p-mplex';
import SECIO from 'libp2p-secio';
import MulticastDNS from 'libp2p-mdns';
import KademliaDHT from 'libp2p-kad-dht';
import Gossipsub from 'libp2p-gossipsub';
import PeerId from 'peer-id';
import WebRTCStar from 'libp2p-webrtc-star';

const p2p = {
  namespaced: true,
  state: {
    peerid: null,
    peers: [],
    node: null,
    connected: false,
  },
  getters: {
    isConnected: (state) => state.connected,
    peers: (state) => state.peers,
  },
  actions: {
    async initStore({ rootState, state, dispatch }) {
      const { peerid } = rootState.settings.settings;
      state.peerid = await PeerId.createFromJSON(peerid);

      state.node = await Libp2p.create({
        peerId: state.peerid,
        addresses: {
          listen: [
            '/ip4/0.0.0.0/tcp/0',
            `/dns4/xx.xx.xx.xx/tcp/9090/ws/p2p-webrtc-star/p2p/${state.peerid.toB58String()}`,
          ],
        },
        modules: {
          transport: [
            TCP,
            WebRTCStar,
          ],
          streamMuxer: [Mplex],
          connEncryption: [SECIO],
          peerDiscovery: [MulticastDNS],
          dht: KademliaDHT,
          pubsub: Gossipsub,
        },
        config: {
          peerDiscovery: {
            autoDial: true,
            mdns: {
              interval: 1000,
              enabled: true,
            },
            webRTCStar: {
              enabled: true,
            },
          },
          dht: {
            enabled: true,
            randomWalk: {
              enabled: false,
            },
          },
          pubsub: {
            enabled: true,
            emitSelf: false,
            signMessages: true,
          },
        },
      });

     // PEER:DISCOVERY event
      state.node.on('peer:discovery', (peerId) => {
        debug('Discovered: %O', peerId.toB58String());
          state.node.contentRouting.get(peerId._id).then((buffer) => {
            const user = JSON.parse(buffer.toString());
            debug('new user discovered %O', user);
          });
      });

      // PEER:CONNECT event
      state.node.connectionManager.on('peer:connect', (connection) => {
        // it's me...
        if (connection.remotePeer.toB58String() === state.node.peerId.toB58String()) return;

        state.node.contentRouting.get(connection.remotePeer._id).then((buffer) => {
          const user = JSON.parse(buffer.toString());
          // add this peer to known peers
          state.peers = state.peers.filter((p) => p.peerid !== connection.remotePeer.toB58String());
          state.peers.push(user);
        });

      });

      // PEER:DISCONNECT event
      state.node.connectionManager.on('peer:disconnect', (connection) => {
        debug('peer disconnected %O', connection);
        state.peers = state.peers.filter((p) => p.peerid !== connection.remotePeer.toB58String());
      });
    },

    // START NODE
    async connect({ rootGetters, state }) {
      await state.node.start();

      // subscribe to main pubsub channel 'myroom'
      await state.node.pubsub.subscribe('myroom', (msg) => {
        debug(`pubsub messages received: ${msg.data.toString()}`);
      });

      // store my peerId/multiaddrs into peerStore
      state.node.peerStore.addressBook.add(state.node.peerId, state.node.multiaddrs);

      // store who I am
      const me = Buffer.from(JSON.stringify(rootGetters['user/me']));
      await state.node.contentRouting.put(
        state.peerid._id,
        me,
      );
    },

    // STOP NODE
    async disconnect({ state }) {
      debug('disconnect p2p');
      await state.node.stop();
    },

  },
};

export default p2p;
jacobheun commented 4 years ago
  1. the peer:connectis received by the emitting node, several times. i.e the node seems to connect to itself

Incoming dials may log multiple connections. The reason for this is that a node will trial to dial multiple addresses to a peer. Once a connection is established it will abort the "slower" connections, so you should only end up with 1. This is more pronounced in nodes on the local machine because you the latency between the various addresses is negligible.

  1. the state.node.contentRouting.get(peerId._id) promise in peer:discover event throws an error, telling there is no peer available. It sounds surprising, as if a peer is discovered, it should mean there are peers available.

You're using content routing to find attempt to find a peer you've just discovered, what's the reason for this? 1. Use peerRouting for peers, and 2. you might just want to check the peerStore, but i'm not sure what you're actually trying to do here.

3, the state.node.contentRouting.get(connection.remotePeer._id) promise in peer:connect often sends an timeout error,

This DHT in JS is currently quite slow so timeouts can be pretty common. We're working on adding recent improvements we've made to the DHT in Go, but it's not there yet. Again, you shouldn't be using contentRouting to find Peers.

4, if I do a node.stop() on node 1, node 2 receives well the peer:disconnect event, but also a peer:connectright right after, with the same timeout as in 3 :

You're connected to the webrtc-star server, this could be another node connecting to you. You're not logging the disconnect peer id properly, so you can't easily see the difference between the peers.

5, if I do a disconnect() + connect() on node 1,

I'm not following what you're trying to do here. Why are you disconnecting and reconnecting?

olivier-nerot commented 4 years ago

Thanks for your answer. I will try to sum up my goal to make it clearer. The purpose of this code is to share and co-edit documents, so I need to know who is on the network and their profiles. To do so, I store the connected user profile into contentrouting (dht) when he connects. Then, when peer is connected, I get its user profile with contentRouting.get(connection.remotePeer._id). To goal is to have the connected user's profile as soon as it is online, i.e connected, as there is an autodial:true. If I understand, you mean a peerstore would be better than the dht/contentrouting ? I was planing to use dht/contentrouting to store users+documents data, to share them between peers, and to use pubsub to send co-edition events and chat messages. So, should I remove dht, and use peerstore.metadatBook instead ? Or a datastore, as said here to have data persistence ? It looks like datastoreis linked to ipfs... so maybe I should 'upgrade' to a whole ipfs layer ? Dealing with 4. and 5. I was testing what's happening if a peer looses network, to eventually node.start() it again. I can confirm (with a better log), that a node.stop() on peer1 sends the disconnect event to peer2, but also a connect event right afterwards :

neo:p2p-store peer disconnected QmNN4tm2Z2T2LVzVLs3i2tKvj5nJRGCAP3Jkqfz9J2STCh +12ms
neo:p2p-store peer connected QmNN4tm2Z2T2LVzVLs3i2tKvj5nJRGCAP3Jkqfz9J2STCh +870ms
on-meetsys commented 3 years ago

I have finally removed dht, which was still too slow : two computers on the same local network could have to wait several seconds to access shared values. I am still surprised by the connect/disconnect events : il I Libp2p.stop() on a computer (A), another one (B) will receive a Libp2p.connectionManager.on('peer:disconnect') event, fine. But a few seconds after, (B) receives also a Libp2p.connectionManager.on('peer:connect') event, whereas node is stopped on (A). This seems to happen only when WebRTCStar is enabled, as in the code above.

maschad commented 1 year ago

Closing due to staleness