twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Investigate CRDTs #183

Open sritchie opened 11 years ago

sritchie commented 11 years ago

Seems like the concept of Conflict-free Replicated Data Types (https://github.com/jboner/akka-crdt) could be useful here. Some links:

avibryant commented 11 years ago

Yeah, this relates especially to storehaus-voldemort or storehaus-riak. I'm pretty certain that for Voldemort (and maybe for Riak) we could define a CRDT that generalized the G-Counter to any Monoid (which is to say, you would get highly-available, eventually consistent monoid merge updates that wouldn't lose data in the face of a partition etc).

sritchie commented 11 years ago

I think Storehaus's next phase is going to be very interesting. I'm excited to take a weekend soon and re-implement the ElephantDB-like distributed read-only store that ingests Hadoop data, like we'd talked about in the VersionedStore ticket. I don't think there's an excellent, simple solution out there yet that will let users serve their Hadoop data generated by Summingbird.

Perhaps writing a proper scalding-voldemort or scalding-elephantdb would be a better use of time for now.