Open urbien opened 4 years ago
How segmentation is going to work for collaborative editing? What I mean is
How to decide when to merge segments?
@pgmemk good catch, I did not think of that. Now, here is an an idea how to address updates in data items that were already archived / backed up to S3. Maybe we could use union mounts, which are upcoming in Hypercore. Hyperdrive will use the mounts for shared folders. Union mount creates an impression that you can update friend's Hyperdrive that is today mounted read-only. Union mounts were invented by Plan9 and them over a decade were implemented and re-implemented a ton of times in Linux. They are now stable for a long time in Linux and are used for booting from an immutable source, like DVD, but still being able to customize your installation, and boot again with all your changes to immutable files saved separatly from the original DVD.
Initial idea was to use mounts for a virtual merge. But Hyperbee does not have mounts. And Hypertrie mounts change names of the keys. Key 'a' becomes 'mountpoint/a'. New ideas are needed.
perhaps this module could help? https://github.com/little-core-labs/hypercore-multipart
Purpose
We have the following reasons for using Hypercore with AWS S3 (and its many open source re-implementations, like https://min.io):
Rationale
for 2. above: Website without a web server, served from S3 is mainstream now. Why not a database? to me it is revolutionary - and I can think of a couple of use cases (CDN, huge research databases), but there must be a ton.
Our push is to make Hypercore-based apps for dumb dumb users - we are all so spoiled by google email, docs, etc. taking care of everything for us, the P2P solution needs to match that and exceed it with what google will never do. Cloud-level Reliability of P2P seems to be on Hypercore team's radar.
To me reliability is being 100% online, 100% durable, and 100% connectable to on the network
Starting point in Hypercore and what is lacking:
hypercore-archiver with random-access-s3 provide the basic method for piping a feed to S3. But random-access-s3 write method is not implemented
Context
While S3 supports random reads, that is a read from the offset in S3 object, there is no support for random writes:
Restore
Perceived immediate availability
When restoring from S3 need to make it available before all data is downloaded. This should work like AWS EBS drive recovery from a snapshot. Although restore process is still taking place, EBS drive is already made available.
Key management
How to assume ownership of the restored Hypercores on a new machine with a different private key? See #5
Proposed approach
May be we can learn from the Search engine Lucene,. To avoid updating the whole index on every document add / update (which is extremely costly), Lucene writes new data into a chunk, which it calls index segment. On search it reads from all segments and merges the results. Here is how we can mimic this in Hypercore backup to S3, which has similar performance bound for our case:
Each type of Hypercore (Drive, Bee, Trie) will have its own shards. Once feed on disk reaches about 5mb, start a new shard and copy the whole feed dir to S3. Need to write to both the main Hyperbee and to a last shard of Hyperbee, which is not so cool. Maybe write to the last shard in mem? But then need to have a marker in main Hyperbee on log seq of when the last shard started. Ideally we need a checkout after the seq N.