Open emaxerrno opened 7 years ago
https://github.com/palvaro/molly ... and et voila
oh, wow this is very interesting. i had started to design a dedalus plugin for one of our projects, and i've been using molly to run those examples.. it's all very early days :) https://github.com/noahdesu/zlog/blob/dedalus/qa/dedalus/zlog.v2.ded
I think we can add a new code generation facility to do 2 things:
1) Generate an Oracle - which is basically a method proxy that does the accounting 2) Add a systems headers that will either crash, drop the connection, throw exceptions, etc.
This is obviously not for scale, but for correctness.
That's my preliminary design.
Thoughts ?
I know that sean used a config property loaded by the actor system in wallaroo
which does this for their network nemesis.
I also know that seastar has a built in disk nemesis too.
also zlog looks awesome and you are further along than I am. The dedalus plugin looks very coo. I have not sat down and written some, but I can't wait to do it.
We should port zlog to smf! ! haha.
I know @hellertime was looking to build something similar to zlog - i pointed him to your repo.
i'd also add dropping and duplicating packets. peter is a member of our research group, so i could setup a meeting with him at some point if there are questions. i'm sure he'd be very interested in anything LDFI related.
the intent is to bring in smf for some key components in zlog. the tentative plan is to (1) bring in smf to replace boost asio in our sequencer, then (2) for the Ceph-based storage backend I want to build a simple proxy layer that does request aggregation across clients, and (3) build out a proper spdk-based backend for fully replicated the CORFU protocol and eliminating Ceph as a required backend.
oh that's neat! i didn't know you were a researcher - (hadn't google) - exciting times!
RE: LDFI - duplicating packets might be more difficult than duplicating messages - i am not entirely sure how to do it within kernel for example - maybe eBPF?. For DPDK based runtime, maybe hacking the core/net/tcp.hh in seastar
would be the way to do packet duplication.
replicating messages is very very easy, since we own the protocol front to back.
RE: Sequencer: back in the concord.io days we wanted to write a sequencer too for leasing token ranges and never got around it :'( - i.e.: such that only one process could make writes to a particular ( stream_topic, key, value)
tuple.
RE: SPDK.io - i know that avi was thinking about writing a sestar filesystem - probably through libfuse - that uses the same IO engine in seastar so that you can just deploy one app and there would be no need for anything else - of course it would be specialized for storage apps like queues and databases. I hope to test SPDK too but the 3dXpoint drives are so expensive.
One caveat you might run into here is that the IO engine and the queue measurement is global. That is if you wanted to do multi device queue measurement is not possible today with seastar. so you'd have to raid0 the drives and hope for the best.
I think Glauber Costa was working on a multi-device patch at some point.
RE: msg replay: this is easy then :) RE: sequencer: sure thing. RE: spdk: i talked to intel a few months ago, and they were $1400 for consumers. Cloud providers get a big discount, but still pretty steep price for the rest of us. I think the samsung 3D NAND drives are also around the same price (a bit cheaper), though i don't think they have added it yet to the spdk project.
I think seastar could hugely benefit from integrating spdk. I haven't looked too closely but something has to tick the loop, and seastar manages the loop for DPDK, so you'll need to add the SDPK loop there too
@noahdesu i just added a rough outline, thoughts welcomed!
Basically, my idea is that something is better than nothing and even if we just provide the stubs that do:
1) message replay 2) reordering of timing and sequence 3) crashes
It is still significant and useful, albeit it only covers request-response protocols.
On a later design revision we can address a) more complex protocols and sequencing, b) hazard step from molly automatically by linking a SAT solver too.
thanks!
https://people.eecs.berkeley.edu/~palvaro/molly.pdf
tl;dr: add faults and gracefully recover should be the outcome - or at least document failures so they are well understood.
i.e.: fault tolerance is a global property.
molly.pdf