ipfs-inactive / dynamic-data-and-capabilities

[ARCHIVED] Dynamic Data and Capabilities in IPFS Working Group
59 stars 6 forks source link

Explain IPFS PubSub Behaviour over WebSockets-Star Topology #23

Closed harrshasri closed 6 years ago

harrshasri commented 6 years ago

Hi @diasdavid David,

I integrated IPFS successfully into my App at https://dukaanbabu.com But I Didn’t publish the ipfs update into the Store yet.

Bcoz of a couple of reasons

I know I have to wait for DHT implementation in JS-IPFS. But I am wondering about the scalability on websockets-star topology as of now.

As of now, I am using websockets-star for PubSub

But I didn’t understand how websockets-star works for pubsub.

My current config is

Addresses: {
        Swarm: [
             // Will update the hosted rendezvous in production
            '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star/',
            //'/dns4/star-signal.cloud.ipfs.team/tcp/443/wss/p2p-webrtc-star',
        ]
    },
Discovery: {
    MDNS: {
        Enabled: true,
        Interval: 10
    }
    // },
    // webRTCStar: {
    //     Enabled: true
    // }
},
Bootstrap: [
    "/dns4/ams-1.bootstrap.libp2p.io/tcp/443/wss/ipfs/QmSoLer265NRgSp2LA3dPaeykiS1J6DifTC88f5uVQKNAd",
    "/dns4/lon-1.bootstrap.libp2p.io/tcp/443/wss/ipfs/QmSoLMeWqB7YGVLJN3pNLQpmmEk35v6wYtsMGLzSr5QBU3",
    "/dns4/sfo-3.bootstrap.libp2p.io/tcp/443/wss/ipfs/QmSoLPppuBtQSGwKDZT2M73ULpjvfd3aZ6ha4oFGL1KrGM",
    "/dns4/sgp-1.bootstrap.libp2p.io/tcp/443/wss/ipfs/QmSoLSafTMBsPKadTEgaXctDQVcqN88CNLHXMkTNwMKPnu",
    "/dns4/wss0.bootstrap.libp2p.io/tcp/443/wss/ipfs/QmZMxNdpMkewiVZLMRxaNxUeZpDUb34pWjZ1kZvsd16Zic",
    "/dns4/wss1.bootstrap.libp2p.io/tcp/443/wss/ipfs/Qmbut9Ywz9YEDrz8ySBSgWyJk41Uvm2QJPhwDJzJyGFsD6"
]
}

Here is a SNEAK PEEK of Github for Shopping pricegraph

daviddias commented 6 years ago

@pgte mind getting this one?

pgte commented 6 years ago

Hi @harrshasri,

This is based on my recent sparse knowledge of the ipfs and libp2p stack, so please @diasdavid correct me if I'm wrong or imprecise.

There are two layers in your question. First is the floodsub algorithm (which is the current naive implementation of pubsub in ipfs) and then there is the websocket-star protocol.

Floodsub is a very naive and simple protocol: when a peer connects, we dial the floodsub protocol (multiplexed over the peer connection). Each node keeps a list of all the nodes it's connected to. Each node updates the remote nodes on the topics it's interested in. When receiving a message, a node checks to see if that message was already processed. If not, 1) the node caches the message id, 2) emits that (topic, message) to the user and 3) forwards that message to every node that's interested in that topic. As you can see, there is no overlay network found here. It simply constructs a star overlay on every known peer that the transport connects to.

Now, the websocket-star transport: The websocket-star server serves as both discovery and transport relay, connecting peers through it.

So, trying to answer your question: when a node connects to a websocket-star server, that peer address is frequently broadcasted to every other connected peer. When finding out about a new peer, js-ipfs dials to it immediately, which makes the floodsub protocol dial to it, which happens through the websocket-start-rendezvous server (which is not only a rendezvous server, but also a 2-hop relay).

@harrshasri Does this answer your question?

harrshasri commented 6 years ago

@pgte So, it is flooding via the Signalling-Server Which means the Signalling-Server is having the Bandwidth cost of not only the peerInfo but also message/content transfer between nodes.

Did I understand correctly?

pgte commented 6 years ago

Correct. It's the nature of websockets: they don't allow direct p2p connections, while the webrtc-based ones do.

harrshasri commented 6 years ago

But we don't have PubSub for webrtc functioning as of now, right?

pgte commented 6 years ago

Webrtc works (through the libp2p-webrtc-star transport), and it allows pubsub (treats it transparently as any other multiplexed algorithm).

harrshasri commented 6 years ago

I will give a try once again. But last week when I tested it didn't discover its peer. Thats why I asked you this question.

So What happens in WEBRTC for pubsub peerDiscovery?

Is everybody in the network swarmed after rendezvous signalling and floodSub Query to its peers based on the topic? Or They swarm based on the topic via Rendezvous server and floodsub the content?

pgte commented 6 years ago

It's the same process as described, with the exception that the p2p connections should happen directly, without needing a relay server.

harrshasri commented 6 years ago

Okay, so does It swarm everybody irrespective of the topic? If there are 1000 nodes in the room every client opens 1000 connections right?

pgte commented 6 years ago

No, it still uses the floodsub protocol irrespective of transport:

Each node keeps a list of all the nodes it's connected to. Each node updates the remote nodes on the topics it's interested in. When receiving a message, a node checks to see if that message was already processed. If not, 1) the node caches the message id, 2) emits that (topic, message) to the user and 3) forwards that message to every node that's interested in that topic.

harrshasri commented 6 years ago

So when subscribing to every peer it knows. What if the topic is not published by those known peers? this.peers.forEach((peer) => sendSubscriptionsOnceReady(peer))

harrshasri commented 6 years ago

I'm sorry I didn't understand this part.

If not, 1) the node caches the message id, 2) emits that (topic, message) to the user and 3) forwards that message to every node that's interested in that topic.

pgte commented 6 years ago

In floodsub, a node must declare interest in a topic for it to get messages on that topic.

When receiving a message from a remote node, a node only forwards that message to a known node if and only if that node is interested in that topic.

This simplistic approach has (besides others), the downside of poorly-connected nodes have a low probability of getting messages on unpopular topics.

harrshasri commented 6 years ago

So would you recommend DHT For my use case as the PubSub peer & content discovery takes time?

pgte commented 6 years ago

Could you describe your use case? What would you use pubsub for?

harrshasri commented 6 years ago

I understand the dynamics PubSub play when there are no Publishers. But even when we have enough publishers. But not within the known list of peers on the node which is subscribing. How do they discover the publishers?

pgte commented 6 years ago

They don't explicitely. To receive the message on a given topic, each node must already be connected to a publisher or to a node that's interested in that topic.

harrshasri commented 6 years ago

I am sharing Price Data which is published by some users in their wish list And people who are browsing the product page will subscribe and retrieve the price history.

This is a subscriber

pricegraph

Who queries within the other publishers to get the price data

pgte commented 6 years ago

@harrshasri does my last answer answer your question?

harrshasri commented 6 years ago

They don't explicitely. To receive the message on a given topic, each node must already be connected to a publisher or to a node that's interested in that topic.

Aah! Thats why It didnt work for me in WebRTC the data wasnt transferring. As WebSockets are acting as 2HOP relay . This app was working.

Now It makes sense.

I have only two options. 1)WebSockets 2)PubSub over DHT

harrshasri commented 6 years ago

I need to now think upon scalability requirements of Rendezvous Server.

Until DHT is implemented in JS-IPFS.

One more query do we have BitSwap in JS-IPFS yet?

pgte commented 6 years ago

Yes, bitswap is the block exchange protocol, used in the object, files and DAG APIs.

harrshasri commented 6 years ago

But we aren't leveraging that in PubSub Layer, are we?

pgte commented 6 years ago

pubsub does not use bitswap. Pub-sub is designed to be a real-time(ish) best-effort topic-based message delivery system and has nothing to do with the content-addressable part of IPFS.

harrshasri commented 6 years ago

Oh! I was thinking PubSub is done on top of BitSwap. There is some research involving PubSub Over DHT. Which I think is very necessary if I want to remove the 2hop relay. Or to make it completely decentralised.

https://discuss.ipfs.io/t/dht-on-pubsub-and-general-pubsub-improvement/1692/2