status-im / status-go

The Status module that consumes go-ethereum
https://status.im
Mozilla Public License 2.0
728 stars 246 forks source link

chat: delivery protocol documentation #222

Closed tiabc closed 7 years ago

tiabc commented 7 years ago

The current state of push notifications is not great.

@rasom:

It is implemented in some form. As you may have noticed, in this doc https://github.com/status-im/status-go/wiki/Whisper-Push-Notifications examples are written using Whisper v2, so i guess there could be needed some changes. Also this document describes exchange of symmetric key by “sending encrypted message with Subscription SymKey”, which particularly in this case means sending symkey in Whisper message that is encrypted with user’s public key. Maybe that’s not the best approach to exchange keys. As well as messages that are sent to server (which later are sent to some Push Notifications service) are meant to be sent as plain in payload and only encrypted with SymKey mentioned above.

We should create an actual document explaining what's currently done and what we should change in it.

rgeraldes commented 7 years ago

@jarradh @sla-shi

(in progress - random thoughts)

Main problems:

  1. Availability
  2. Centralization

--

  1. An idea just popped up for inboxing to avoid centralization,

We want to deliver sentences. What if instead of trying to deliver the sentence as a whole we could split it up into words and have a group of peers delivering each word(s) (DHT algorithm) - Each peer contributes with part of the work - , meaning that groups of peers would store certain words familiar for example to the peer id (just a random choice). One broadcast to request history would be enough to collect all the pieces of the puzzle. Using some sort of algorithm we could solve both problems, Dictionaries would be sliced across groups of peers and everybody in the status network would contribute. Space would not be a problem (important for mobile!) if some of the flaws of this system are addressed. Notice that this system has its flaws but it was just a thought. Not sure if its feasible.

  1. Other option for descentralization would be rewards to guarantee availability:

"Proven social incentives such as rewards and social recognition could stimulate users to leave their P2P software running for longer periods, thus improving the overall availability of the network. Autonomous peers are free to decide whether to donate resources or not."

  1. Dispersy - data synch via bloom filters; it requires at least one node in the community to be available to compare bloom filters, otherwise if both nodes are not online, bloom filters cannot be compared.

  2. direct p2p > Deploy email/notification nodes

tiabc commented 7 years ago
  1. That idea sounds nice from the first glance but our payload is not sentences. It's random JSON payload sent by our react counterpart. This is what spreading the payload over words will bring:
    • More data traversed through network because every word will probably have significant header overhead. This overhead will even become bigger in the future provided that we'll have the double ratchet encryption in most cases.
    • This is why peers will probably store even more data in total with only little useful payload.
    • As we can't guarantee peers availability, we had better not split the payload over multiple pieces as some pieces sometimes may not be delivered.
    • Which in summary may lead to an opposite effect: more data travelling over the network and more data stored by peers with not guarantee of availability.

The distributed nature of our networks and lack of guarantee of peers availability makes us duplicate a lot of data but we should split that data with care so I wouldn't pursue this thought.

  1. This option sounds more viable, let's listen to what @jarradh says. In my opinion, this may be good as one of the future options but I don't think it's feasible to implement by our beta release and we should come up with something technically rather than economically achievable.

3, 4. +1

Please, correct me if I got something wrong.

rgeraldes commented 7 years ago

Thanks for the comment @tiabc, Indeed right, with end to end encryption 1 is not feasible if privacy is a concern which is.

  1. Having Farmers is indeed an option for the long term and it's being done right now for a good number of decentralized storage services.

Short term option for devcon3 would be to propagate email/notification nodes and do direct p2p messages to retrieve messages as soon as the recipient is online again. Flow would be something similar to https://github.com/status-im/status-go/wiki/Whisper-Push-Notifications Messages would be sent via broadcast to email/notification servers (we would wait for x number of replies) which would try to contact the recipient or instead of trying to contact the recipient (via broadcast) as it may take a while to deliver the message or it not deliver at all (we are not sure), we could send a priority silent notification directly which would be probably faster in most cases and the recipient would retrieve the information needed. As soon as the information collected was retrieved the email servers would remove the data entry.

(topic exists? - https://github.com/ethereum/go-ethereum/pull/14836)

In general I think that we are just looking at part of a problem as the main problem starts with the network and and the lack of routing/knowledge due to the dark communication to avoid traffic analysis. The following ideas from the whisper wiki page are very interesting and the storage itself could be done based on the knowledge of the network + disk space that user is willing to share:

The connection, between two routing nodes, could be direct, 1 proxy intermediary, 2, or 3, it could use shared secrets instead and route fragments of datagrams across multiple heterogenous circuits, as well, and the receiver would then wait for sufficient fragments to assemble the original packets. Indeed, it could be possible in the case of (not quite related to whisper, but to the distributed data storage protocol) larger streams, break the stream up into parts and alter the obfuscation method through the process, further confusing the traffic analysis data. The other criteria for deciding how to scramble the routing is latency. For some purposes one wants lower latency, and other purposes, greater security is vitally important. When in the process streams are fragmented into parts, it can also increase security to apply an All Or Nothing Transform to the entire package, then if part is intercepted but not the complete message, it is impossible to assemble the data, not even for cryptanalysis purposes.

rgeraldes commented 7 years ago

(random ideas - data storage)

In complex banking systems they basically have to archive data every day due to the huge amount of data, but the important data remains in the main db/cached. Older account history can still be accessed. If we followed the whatsapp model, we would be removing the data every time the recipient gets the data but we would still have a problem, which is when the user stays offline for a long period of time a lot of data could be stored (especially in a group chat) but if the users agreed on allowing a certain space to be used, it would work.

Other idea in the future could be social sharing, meaning that your "friends" in the status network would be the ones helping you with saving data while you are offline.

rgeraldes commented 7 years ago

Adding some info on the topics addressed above:

https://www.microsoft.com/en-us/research/publication/quasar-a-probabilistic-publish-subscribe-system-for-social-networks/ (Publish/Subscribe for DHT based on Topic probabilistic routing) https://eprint.iacr.org/2017/713.pdf (Security of instant Messengers - X3DH, Double Rachet processes) (thanks @jeluard; https://eprint.iacr.org/2017/)

For future reference: http://zhen.org/blog/benchmarking-bloom-filters-and-hash-functions-in-go/ (benchmarking go implementations of bloom filters)

Plan for tomorrow is to use the email server (archive/delivery via direct p2p connection) and the notification server.

@jarradh I am also aware on how to implement probabilistic routing based on bloom filters (summary of topics). I tried to talk via pm with gluk256 on the topic but I did not receive any reply. I think that this something that we could contribute to whisper if they consider this as an option to the protocol.

Any feedback is welcome guys

rgeraldes commented 7 years ago

Friday: Mail Server Tests ; top right corner screenshot at aug 04 16-21-47 Saturday - Bloom filters implementation (https://github.com/rgeraldes/bloom) Sunday - Started to implement probabilistic routing into whisper I'll continue the protocol development in my free time.

Current Setup:

oskarth commented 7 years ago

Work in progress document here: https://docs.google.com/document/d/1OgjnY8ps8lVA4dIohwkfGK9HVt0nZxEWbuNdb7BX5-o/edit#

rgeraldes commented 7 years ago

Thanks @oskarth, I will add the info to the document as soon as I have a confirmation from the team that we can proceed.

I've attached the proposed document. It addresses the following topics:

I’m also going to close #136 since the test proposed has been done locally. Document: whisper_chat.pdf

Edit: future reference; edge case - order of messages via shh.post Edit 2: future reference: forwarder flag has a different meaning, but the solution remains the same nonetheless.

tiabc commented 7 years ago

@rgeraldes where can I find what was requested for the issue? The current state of push notifications implementation and changes we should introduce. What stated in the document seems already known except for your proposal to use direct connection.

rgeraldes commented 7 years ago

As discussed earlier today, we will move forward with Victor's flow.

tiabc commented 7 years ago

Please, actualise the current documentation: https://github.com/status-im/status-go/wiki/Whisper-Push-Notifications with the new flow, architecture decisions and whisper5 code snippets.

Can you also draw a scheme or describe how we communicate with firebase servers and how device subscribe for push notifications?

rgeraldes commented 7 years ago

documentation

tiabc commented 7 years ago

Closing as some documentation has been written.