ktoso / akka-persistence-hbase

An HBase backed Journal for Akka's experimental persistence / event-sourcing
Apache License 2.0
47 stars 20 forks source link

Separate HdfsSnapshotStore as akka-persistence-hdfs #8

Open ktoso opened 10 years ago

ktoso commented 10 years ago

Snapshots can be also stored directly to HDFS, if they are really big or you need "easy takeout".

dispalt commented 9 years ago

@ktoso do you plan on making the accumulation of events work on hdfs too, or just snapshots?

ktoso commented 9 years ago

Putting events onto HDFS is not optimal I think, HBase performs way better for such operations (SCANs). I'd be inclined to say no to the feature of events directly on HDFS.

(see my talk about avoiding hot-spotting to get more insight how hbase is a better target for events: http://www.slideshare.net/ktoso/hbase-rowkey-design-for-akka-persistence )

dispalt commented 9 years ago

I was thinking since journaling is writing a consistent stream (nothing random), it would be an good fit. In my mind it would function similar to HBase's use of HDFS for it's WAL, since they are essentially the same thing. But you'd probably end up to write a lot of code to make it efficient and preserve the semantics.

Yeah I am familiar with region hotspotting, was more curious abt using HDFS directly.

ktoso commented 9 years ago

I think I'd end up reimplementing a lot of what hbase does when trying to implement this. I always viewed hbase as "the best way to scan across HDFS". If we'd naively just append to a file in HDFS I think we'd bump into hot-spotting anyway - directly overloading one datanode more than another one.

I may be wrong (that happens from time to time :-)), please correct me (code very welcome) if that's the case here :-)