spacejam / sled

the champagne of beta embedded databases
Apache License 2.0
8.05k stars 382 forks source link

Future plans question: delta-state CRDTs, synchronization, sneakernet protocol #663

Closed ckaran closed 5 years ago

ckaran commented 5 years ago

Hi, this is more of a 'future plans' kind of question rather than a bug report or feature request. I was wondering if you have plans for incorporating delta-state CRDTs into sled, but not directly via the network, but via the filesystem instead.

The reason I'm interested in this is because of my own interest in low-bandwidth, high-latency networks, such as what you might find in emergency situations. In those cases, it is sometimes more reliable to send a person or a drone as a courier to physically carry the data from one node to another node. The problem is that you want to start synchronizing as quickly as possible, so once the drone is in range of the target, they start doing something like rsync to bring their data up to date, completing the transaction when they physically meet up (and therefore have a USB 3.0 connections to finish the sync). This brings up several problems though:

The closest thing I've found to what I'm after is the git bundle command; it stores enough information in a file that can be emailed around that you can synchronize different repositories together without using the standard communications protocols. Moreover, if you know the commit IDs of the ranges of interest, the bundle can be trimmed to only hold that range of content (much like a delta state CRDT). However, I haven't yet seen anything similar in the database world (probably my ignorance, I haven't searched a great deal).

Based on what I've seen so far in sled, it looks like you may be investigating doing all of this, but I can't tell for certain. So, is this a direction that you are planning on heading in?

spacejam commented 5 years ago

Hey! Check out riak-dt (this was ported to rust ) and antidotedb for examples of interesting usage of CRDTs in databases.

There are 2 areas where I've got plans to utilize convergent structures:

ckaran commented 5 years ago

I fully understand how there are competing interests, and unfortunately I don't have money to fund development. I will definitely take a look at rust-crdt (I think I did at one time, but forgot about it).

I'm glad to hear that you are open to this sort of development; at the very least, I could take a crack at implementing something either within sled, or on top of it (no promises that will happen though, too much work going on already).

olebedev commented 5 years ago

Hey @ckaran,

I m going to implement RON in Rust along with Swarm with all the features listed in todo section. Probably this will be a good fit for your needs. I've planned to use sled as a storage backend since it has LSMT under the hood and allows to merge.

ckaran commented 5 years ago

@olebedev I actually looked into both RON and Swarm a while ago because they really are a good fit for the research that I'm doing. It seems that it's matured since I last looked at it! I'm also glad to know that you are going to re-implement everything in rust; it makes it easier for me. That said, one of the things that makes sled attractive to me is the fact that it can be embedded within an application; having a separate process running is actually more of a headache than it's worth. So, would you be willing to make it so that the rewrite could be embedded?

Hrm... I just realized that we've veered off of the original topic of this issue, and we're now bringing in non-sled comments. So, @olebedev @spacejam, do either of you know of a good place where we can write up a backlog of ideas that would be good for the kind of databases we're talking about? I mean at a very high level, like what I as an end user would like to see in the database, as well as the kinds of concerns that I as a user have. I'd love to have an organized, public backlog that all database users can add to which all database engine designers can then go through to pick and choose which features they want to implement in their own engines. Kind of like the rust language nursery API guidelines checklist, but geared towards databases, and with the understanding that different engines will choose to implement different features.

davidrusu commented 5 years ago

@ckaran if you do take a look at rust-crdt, note that just about everything has been redesigned and rewritten.

The published docs have not been updated and a release has not been made yet so you'll have to read code in master to see the current state of things.

Don't hesitate to poke me if you want a release out sooner, I've been meaning to do the prep work to get things ready for a release but can't seem to find the time.

ckaran commented 5 years ago

I can't leave well enough alone... @olebedev, does RON directly support δ-CRDTs? Reading through the specs, it feels like it requires operational CRDTs, in which case quite a large amount of data needs to be shipped around all the time. It also makes it difficult for nodes to be passively updated. As an example, consider the following scenario:

You want to maximize the rate of useful information transfer. That means you need to do the following:

Part of my research is figuring out that first step. From what I can see of RON, it doesn't seem to consider this problem at all (I'm not 100% sure, but it looks like RON is designed to work in pairs of databases, not in groups like you'd see in UDP broadcast). So, am I reading the spec incorrectly, or does RON not support what I'm talking about?

ckaran commented 5 years ago

@davidrusu I fully understand about not finding the time to make a release! Once all the code & research for my PhD are done, I want to release it... but cleaning everything up so that it meets the rust language nursery guidelines will mean several months of work.

That said, are you planning on supporting δ-CRDTs in the future? Both CvRDTs and CmRDTs would be prohibitive for database engines, especially when bandwidth is low, latency high, and you can't be sure that the connection is going to stay up for a long enough time to fully synchronize.

EDIT

Just in case anyone is unaware of this site: https://github.com/ipfs/research-CRDT/ and its issues have a lot of really good information.

olebedev commented 5 years ago

@ckaran,

does RON directly support δ-CRDTs?

Yes, it does.

davidrusu commented 5 years ago

I have no plans for delta CRDT's at the moment, for my uses grouping a bunch of ops together and compressing them solves my problems.

That said, PR's are always welcome :)

ckaran commented 5 years ago

@davidrusu I understand, and if I ever have any free time, I'll try to look into it. That said, don't hold your breathe! I'm so far behind at this point that I'll be dead of old age before I get my current stack done!

theduke commented 5 years ago

@olebedev would be great if you share your work (RON/swarm implementation in Rust) here once you publish it.