Open urbien opened 4 years ago
Effectively I understand this to be a "hosted hypercore with offline-capable updates from client devices." It's notable as a variation of a simple "hosted hypercore," which is a hypercore that exposes its API to the network to accept writes from clients via RPC.
In your proposed model, each client device maintains a set of deltas (which I might call the "staging cache") which it can sync to the hosting device at any time. Until sync, the client device will read a union of the currently-known master hypercore and its local cache, enabling live writes in an offline context.
A "hosted hypercore" model loses the ability to accept writes in an offline context. This proposal can continue to accept writes when offline, but will not propagate the writes to the network until the client device can sync with the master device. It is a simpler model than masterless multiwriters, but it maintains offline-first abilities.
1.3. loses the authenticity of individual records (confirm?)
Not sure about the other properties but I'm pretty sure each writer uses their own hypercore and so authenticity isn't lost.
The solution is simpler if it is for personal use only. For teams we might create a different solution.
I usually refer to this as "multi-device" vs "multi-author"
Leader rotation & consensus algorithm
What requirements do you have which require leader election and a consensus algorithm? Are you assuming the master would not be able to appoint a new leader, eg due to catastrophic data loss?
@pfrazee thank you for the comments! I will review in full, and react to all, but want to share the correction to the first statement. Our intent is not just a hosted Hypercore. We want Cloud Apps to work there too. So updates will be initiated both in the cloud and on devices. In this way they are symmetric. But they may not need to be symmetric in the way updates are pushed. For example, a 10MB file does not need to be uploaded from a device to all Personal Cloud replicas and to other devices. This would be quite inefficient, and costly on a data plan. So updates could be sent to one of Personal Cloud replicas to be disseminated further.
Great writeup! There's some cool stuff in here. I think you might also want to look into what CoBox made with their Kappa-Drive module which is kinda in between co-hyperdrive and kappa.
Regarding co-hyperdrive/multi-hyperdrive:
watch()
function actually downloads data unless you do some sort of FS operation like readdir
on watchingI'm excited to see what your new system will look like once it's out, and thanks again for putting these docs together!
Added a section on how multi-hyperdrive works.
Thanks @RangerMauve! Will definitely try to grok Kappa's multi-writer algo, might need someone's help in their community. @pgmemk and I have already played with multi-feed, and will explore KappaDB.
- I don't think the
watch()
function actually downloads data unless you do some sort of FS operation likereaddir
on watching
Right, I was hypothesizing that if you want to uphold the offline-first principle, you would download upon getting a notification in Watch. Guess you do not do that after all :-)
- The consistency is actually decent if peers are online and replicating. Notification of changes will go pretty fast through all the peers. If all peers see the same latest index for all the feeds, then they will resolve to the same view of the "latest file".
Agree. But if they are not online, then file stat() will be the time of file download, not time of update. This is tolerable for individuals. iOS Notes are that way. But I am spoiled by Google Docs.
But my biggest concern is different. Even if bad merge decision is made, it is tolerable. What becomes hard to tolerate is replicas ending in different state. They can continue to diverge from there. As we discussed, it would be useful to understand how to achieve multi-writer for Hyperbee, Hypercore, not just Hyperdrive.
- I don't think backups are quite as complicated as you're imagining.
Noted, you took care of it! This is great. I will move it from Con to Pro.
@RangerMauve - How does adding new peer work in multi-hyperdrive? Or to be more specific, I can understand it can be done via backup / restore. But can it be done by asking for data from another peer?
@urbien multi-hyperdrive doesn't do anything for adding peers automatically and relies on the addDrive API to be called by the application.
co-hyperdrive adds in the automatic writer tracking by storing writers in the hypertie.
When you authorize()
another writer key, it get saved to a hidden key inside an existing writers hypertrie.
Other writers will detect the change in the hypertrie and then make sure they've loaded the full list of writers / detected removed writers and purged them.
So with that you can be sure that if you authorize()
a new writer (provider you're already a writer), anyone else replicating will see the change. I've tested this in the Natakanu app and it's honestly a little magical how it just works.
Regarding the updated cons for co-hyperdrive:
Network: painful fan-out on a data change. If pre-fetch on watch event is used it number of peers in the swarm * data-size. If pre-fetch is not done, the file still can be requested on more than one peer, and it will be uploaded more then once. Besides, the mobile may not be online to provide the data (waking up mobile app in background mode is possible but it has limitations).
This isn't any different than downloading all the data of a single writer. No sure "will be uploaded more than once" would mean. When a peer tries to load a file, it'll see who it's connected to that has the data and fetch individual blocks once between all the peers. It won't load the same block twice. Mobile peers being offline is the same as the single writer scenario and requires backups of some sort for higher resilience in the same way.
Consistency: can end up with different in state of each master (needs CRDT and clocks)
This is only true for when peers are offline and aren't replicating with anybody. This is the same regardless of whether you're using CRDTs or clocks. The only alternative is not allowing reads when you're not replicating.
Collaboration: coarse conflict resolution, but can be improved with CRDT and file metadata
This is partially a limitation of hyperdrive not having coarse random-access reads.
If a peer is offline and it writes, then there's no way for other peers to get it's new data, I don't think there's any way around it except preventing peers from being able to write if they don't have any other peers.
on the following issue:
TODO: Is tracking changes reliable? can watch events get lost, e.g. when peer dies?
If a peer is offline and it writes, then there's no way for other peers to get it's new data, I don't think there's any way around it except preventing peers from being able to write if they don't have any other peers.
I meant that a watch() was triggered, but right at this moment the process died before we initiated something that would persist it. Will this change be lost or there is a way to receive this watch() again? I know in multi-hyperdrive this is not needed, but in multi-hyperbee that we are designing, we will be applying the change in a peer's clone to a primary hyperbee.
I just realized that the merge should be done at the stage when resource is about to be submitted
Some flow example:
We ended up with two versions of the same resource that need merging before submitting. In Tradle we can detect that the version of the resource that is been changed is not the same as the most current one That's where we can initiate the merge
first alpha release of multi-hyperbee is out and passes a bunch of tests. The work continues to:
See the roadmap on multi-hyperbee repo
the implementation for this issue was released as multi-hyperbee
Problem
Hypercore is a single-writer system. In a multi-device scenario (e.g. a mobile and a PC) we need to find a way to present one [virtual] Hypercore across all devices. We should include into our devices also personal Cloud peers to take advantage of their reliability.
Existing approaches in Hypercore community
Materialize merged Hypercores into one (Kappa-DB does that). Drawbacks: 1.1. duplication of data (confirm the extend of it?) 1.2. loses sparse access 1.3. loses the authenticity of individual records (confirm?) 1.4. loses performance
Master feed + deltas. It is developed by @rangermauve for Multi-Hyperdrive. The algorithm is specific to Hyperdrive, but can be extended to a degree to Hyperbee and Hypercore. Extending it to Hypertrie is easy as Hyperdrive is built in Hypeptrie and has the same access semantics (minus the file contents).
Evaluation of Multi-hyperdrive
How it works
Pro
Con
Approach with Personal Cloud nodes
Upload strategy and topology
To optimize replication for Multi-hyperdrive we can distinguish between the capabilities of the peers, taking the following into account:
Discovery of topology
Multi-hyperdrive topology is any-to-any. We need something different here.
Originator of the change can discover the capabilities of the Peers in the swarm (a separate project that would utilize small DHT storage, already used by Bitfinex), and adjust replication strategy in the following ways:
Merge strategy
Hyperbee and Hypercore need different merge strategy from Hyperdrive. Multi-hyperdrive does not materialize merges. But for Hyperbee especially, this could be unavoidable. Data can be fetched on watch and immediately copied to local feed, thus allowing searches. Hypertrie may continue to use the same strategy as Hyperdrive.
Now, how do Cloud peers achieve consensus?
Consensus research
Simple and fast Consensus on Cloud Peers. Because Cloud peers are always available and are on a fast network, consensus algorithm can be simpler and greatly reduce the probability of inconsistency. Time in Cloud can be managed well, further simplifying consensus algorithm. We can start with Cloud Peers in the same data center, but on different machines, and even different racks, and maybe different zones in the same data center for power isolation. Then we can develop an equivalent to AWS availability zones with a more complex consensus algorithm.
Which consensus algorithm? Consensus research for Databases has been supercharged with the advent of Blockchains. EOS blockchain demonstrated that if we assume all peers are on a fast reliable network with low latency, a much simpler consensus algorithm becomes possible and it converges 2-3 orders of magnitude faster (EOS can do 3000 transactions per second). 1.1. Non-Byzantine algorithms used in databases are Paxos and RAFT. 1.1 PBFT is quite mature, supports Byzantine faults, but requires (n-1)/3 nodes (so minimum 7 nodes?) and has difficult leader selection. 1.1 Tendermint improves by rotating the leader every round, and skips non-responding leader automatically (how many peers minimum?)
Leaderless non-Byzantine consensus
We set out to support multi-device and team-collaboration scenarios. Most changes are expected from personal devices. Later, Cloud App will also generate new data as well. We will limit by the data type what changes Cloud peers can initiate so that they do not clash with a single-writer model.
If we design do non-Byzantine faults we can be make use of a new approach for leaderless multi-master, used by AWS DymamoDB and Azure Cosmos. It is based CRDT innovation that occurred in the last 5 years.
CRDT merge
Merge changes into master with the help of CRDT, used in Redit, as well as Cloud-scale databases AWS DynamoDB and Azure Cosmos. Use yjs or https://github.com/automerge/automerge).
Secure Clock
Use vector / bloom / HLC clocks to resolve conflicts, to achieve 100% the same state on all nodes, eventually :-).