hydro-project / fluent

A data-driven compute platform
Apache License 2.0
1.22k stars 173 forks source link

Any advices of compaction & compression integrate with Anna's actor ? #55

Closed leftjs closed 5 years ago

leftjs commented 5 years ago

Hi, riselab's researchers:

I have seen your kv write function, it's just a normal file i/o.

Anna can get high throughput but can't get similar compression rate like cassandra or scylla.

if I want to implement some LSM features in an Anna's actor thread, features as follows:

  1. memtables and WAL for cache write
  2. a background thread, that can check memtable size, and flush to ssfile when memtable up to special size.
  3. a background compaction checking thread, that can trigger a compaction action when the count of ssfiles until special size.
  4. a background compaction thread, that can compact many kvs at trigger time.

Like above, I need to use multi-thread in an Anna's actor threads, coordination in multi-threads will reduce Anna's actor throughput. Do you have any ideas for doing this?

What's more, I need really column storage to save a series of kvs, then I can use some algos to compress my data, such as simple8b, zigzag, snappy and so on. How can I achieve this in Anna?