libp2p / notes

libp2p Collaborative Notebook for Research
MIT License
37 stars 4 forks source link

Offline Message Queue #2

Open Stebalien opened 5 years ago

Stebalien commented 5 years ago

There's an open problem we'd like to resolve:

  1. User A generates some content and adds it to IPFS. They want to deliver this content to user B.
  2. User A tries to connect to user B. However, it turns user B is offline.
  3. User A goes offline.

Currently, we have no good way to deliver this information to user B. Textile uses the DHT as follows:

  1. User A fails to connect to user B.
  2. User A puts PeerIdOf(UserB) -> CidOf(content) to the DHT (treating the DHT as a sloppy hash table).
  3. When user B comes online, they find themselves in the DHT and download their messages.
  4. Finally, they send an ACK back to user A (using the same mechanism).

Unfortunately:

  1. This is relying a lot on potentially ephemeral nodes.
  2. There's no way to remove these messages from the DHT until they expire (see: https://github.com/libp2p/go-libp2p-kad-dht/issues/196).
Stebalien commented 5 years ago

One alternative is to allow users to choose some always online node as their message queue. Users could either run these themselves or pay someone else to do so. In practice, we (and others) can probably just run this service for free.

Design

Parties

Setup

The receiver picks a set of queue nodes and creates and signs a record specifying which queues should be used and, for each queue:

We'll have to balance flexibility with simplicity when designing this policy language.

Finally, the receiver gives this record it's selected queue nodes. These queue nodes are responsible for repeatedly putting this node into the DHT.

Protocol

  1. The sender adds their content to IPFS (well, likely IPLD).
  2. The sender convinces some service to store the content while they're offline (e.g., using a pinning service).
  3. The sender looks up the receiver's "queue" record in the DHT.
  4. The sender connects to the receceiver's queue nodes according to the specified policy, sending the CID of the content to each one.
  5. When the receiver comes online, it checks with its chosen queues for any queued messages, draining the queues in the process.

Drawbacks

The chief drawback with this protocol is that it's not truly peer-to-peer. That is, there are some nodes that must be willing to act as queues and users must be willing to rely on them (possibly paying them).

However, the alternative is to expect random nodes in the network to perform this service (like we do with the DHT). This is fine for ephemeral information that can be frequently refreshed (e.g., the queue records) but less useful for potentially long-lived messages.

sanderpick commented 5 years ago

Thanks for jumping in here @Stebalien. I've been batting around a pretty similar interim setup to your proposal. The key differences being:

Of course, it would be amazing to have this functionality in core. Some questions regarding your proposal:

Stebalien commented 5 years ago

Does the Queue node still insert a pointer to the content node into the DHT? Does this mean we'd still need to implement DHT delete?

No. The queue node would just tell everyone that it's willing to store messages for user X. The client should already know which queue's it has chosen.

A client node signals the location of its chosen queue to the network by inserting a property into its IPNS based profile

The only difference in this proposal is that it makes the queue node responsible for keeping this record alive. The problem with IPNS is that we make IPNS records expire to prevent replay. Ideally, in this case, the user would insert this information into some IPNS-like record and then distribute this record to all of it's queues. The queues would then republish this to the DHT as necessary (to deal with DHT churn).

sanderpick commented 5 years ago

Ah, makes sense!

Authentication w/ a queue seems straightforward enough, but what about authorizing a node to be able to start a queue on another node? Could this just be a config setting that nodes opt into? i.e.,

"MessageQueue": {
  "Enabled": true
},

That is to say, if a node has this enabled, any other node can lean on it for this service. Perhaps a client + queue size limit would be helpful, esp. considering the potential for spam.

Stebalien commented 5 years ago

Personally, I wouldn't even bundle this with go-ipfs. Instead, I'd build a new libp2p service providing a queue-server (it's pretty easy to write new libp2p services at this point). Unlike nodes in the DHT, these queue servers would have to be pretty reliable for this system to work.

sanderpick commented 5 years ago

Sounds good. Couple questions, though I may be conflating solutions...

...the user would insert this information into some IPNS-like record and then distribute this record to all of it's queues.

Would does IPNS-like mean here? I'm likely not familiar enough with the underlying mechanism to infer your meaning.

The queues would then republish this to the DHT as necessary (to deal with DHT churn).

So, the queue node essentially pins the record?

To see how this might play out, I can start by adding handling to our existing service layer (which needs upgrading to the newer, more idiomatic libp2p services).

Down the road, I can break it out into a standalone server, usable by others. Are libp2p hosts able to handle multiple services simultaneously? If not, I suppose we could run multiple since we'll still need the textile thread service, i.e., each of our app nodes would need to run:

Stebalien commented 5 years ago

Would does IPNS-like mean here? I'm likely not familiar enough with the underlying mechanism to infer your meaning.

Well, I guess they could just use IPNS. It would just add a layer of indirection because IPNS records can only point to a single path (queues would have to announce both the IPNS records and provider records for the object they point to). However, that's probably the cleaner solution.

So, the queue node essentially pins the record?

Yes.

Are libp2p hosts able to handle multiple services simultaneously?

Yes. You can register as many services as you want.

sanderpick commented 5 years ago

I made some progress here. The textile-go lib now has a libp2p service for handling inboxing. The basic idea is...

other stuff:

How another peer can determine peer A's inbox(es) is a bit baked into application logic at this point. There's a concept of a "thread" which is a hash tree of state updates, handled with another p2p service. Each update has a header message which contains the author's inbox(es) addresses. I don't currently have a way for a peer to advertise these addresses to network as a whole, though we could add to the public IPNS profile. Will need something there in order to "look up" a newly discovered peer's inbox(es).

Components:

Outgoing messages to other peers (which may be direct or end up in a hosted inbox) are queued as well (I'm trying to achieve truly offline first UX). If direct delivery fails, the (encrypted) message ends up in the cafe outbox (different protocol).

We'll take this around the block for a bit first, but I'd like to spin out the inboxing piece to a standalone p2p service repo. Would be neat to have a standardized way of releasing / including p2p services (maybe there is one?). I put together a base service based on some of cpacia's work that may be useful to others. It takes a protocol and a handler and you're off to the races.

Of course, feedback much welcome... thanks for the help so far!

thomas92911 commented 5 years ago

Thanks for jumping in here @Stebalien.

Offline messages should really be implemented by the service application. Write dht is not recommended, it is not used in this way.

My initial thoughts:

  1. message based on libp2p-pubsub.
  2. one application use the same one topic.
  3. message-router-service should save message and respond to queries
  4. The message sender and receiver first step is exchange the pub-key (say hello) and then use the other's pub-key to encrypt the message.
  5. The message receiver reads the to-addr field and determines whether it is parsing or
    forwarding.
  6. timeprove use TOC (Time On the Chain), not the real time, is the latest block hash(filecoin, btc,....), so it can't be pre-made.

Message struct example:

key pair:  secp256k1
    pubkey = multibase58btc(multikey-secp256k1(secp256k1 pubkey []byte))
    addr = multibase58btc(multihash-sha256(multikey-secp256k1(secp256k1 pubkey []byte)))

message:
    topic: "IMessageV0S0"
    pubkey: pubkey
    sign: private-key-sign(message)
    message:
        type: 0~...
        to: addr
        route: "-"
        ctime: "123456678"
        timeprove: multiprove("000000000000002cc43169bd.......")
        body: multi-pubkey-encrypt("{...}")
        nonce: "54bacdef51a11416"

message size limit: 2000
message body raw data(before multi-codec):
    size limit: 1024
message type:
    hello
    jcard
    query-jcard
    message
    offline-message
    query-offline-message
    message-receipt
Stebalien commented 5 years ago

I really can't figure out what you're proposing, why, what exactly you're trying to solve, etc. Could you start off with some example applications and issues with the existing proposal?

thomas92911 commented 5 years ago

Sorry, I am talking about what I want to do and provide it to ipfs developers for reference.

P2p chat app, it is an application based on ipfs network

  1. P2P encryption
  2. Offline message
  3. Each client node is a service node

I don't know if I can make it clear.

(BTW, I am in China, I hope someone can really understand what I want to do...)

tobowers commented 5 years ago

This is great work and textile is looking awesome. We've been thinking along these same lines, but I've been trying to go down the path of sender keeps the queue rather than receiver. I think it puts the incentives in the right place.

Something like:

  1. Sender negotiates with an online server to keep its outgoing messages
  2. Sender attempts to send message to Receiver (fails)
  3. Sender puts messages into online server which listens to a topic
  4. Receiver comes online and says "I'm here" and online server sends a CID to a queue of messages
  5. Receiver acks the messages.
dirkmc commented 5 years ago

@tobowers that sounds like it's along the lines of Internet Mail 2000

tobowers commented 5 years ago

@tobowers that sounds like it's along the lines of Internet Mail 2000

Yeah a bit, but with these new architectures and talking always-on servers we have a new set of tech to rethink how things should be.

dirkmc commented 5 years ago

Yes I agree, now that most people have a device in their pocket that is essentially always on, the Internet Mail 2000 architecture looks viable to implement on a p2p basis

SomajitDey commented 3 years ago

@Stebalien Regarding your original post, can't simply adding a tag field to IPNS records help? I have posted a detailed proposal at discuss.ipfs.io (tagged you there) and sketched out a mailing/offline-messaging application based on it. Here's the link for your perusal.

Stebalien commented 3 years ago

@SomajitDey that's equivalent to what Textile was doing and the same problems still apply. My goal was to design something that's more reliable/more robust.

SomajitDey commented 3 years ago

Dear @Stebalien, You are right. (Just an off topic thought - But at least such tagged IPNS record would bring what textile is doing, within the purview of Go-IPFS-cli, making our already excellent IPNS even more versatile and helping the core users who don't know libp2p. It's a simple enhancement on top of the existing implementation of ipfs name, without disrupting anything that exist already, or requiring too much new work, right? May be add as an --enable-tag-experiment feature).

Back on topic, in your 2nd post above you wrote:

However, the alternative is to expect random nodes in the network to perform this service (like we do with the DHT).

This would be so elegant. Your analogy with DHT inspired the following strategy. Kindly comment on that.

Nodes can opt to be dhtserver and dhtclient, the latter mainly for efficiency and bw savings. Along similar lines, can't we have two types of IPNS-over-pubsub peers, viz.

  1. those who subscribe to every name-specific topic they discover around them. This is possible because pubsub peers keep track of which topics their directly connected peers are subscribed to.
  2. those who subscribe to user-given names only (as is the case in the existing implementation).

Just as it is expected that there would always be dhtservers in the network, we might always have some pubsub-peers of Type 1 above. In that case, when the sender publishes its IPNS record (tagged with the receiver's peerID) over pubsub, a Type 1 peer that it is directly connected to also subscribes to the corresponding name(tag)-specific topic, and continues to republish as usual when sender goes offline, until the records expire.

There, however, needs to be a mechanism for Type 1 peers to discover one another, get connected, and sync the topic set they subscribe to. Discovery might be achieved in ways similar to how IPNS-over-pubsub nodes discover each other - seeking providers of rendezvous files/DHT keys.

Thanks for your time.