Closed nurturenature closed 2 years ago
When you say "during the test run", do you mean you're looking at the history while the test is still running? It only checkpoints every 16384 operations in the history, so if you haven't gotten that far the history will claim to be empty.
Oh, but for ^C handling, yes, it would be nice if it checkpointed everything! We could modify the signal handler in jepsen.core to... somehow reach into the streaming block writer (which I think is held in jepsen.interpreter) and close it. Maybe by interrupting the generator interpreter and catching the interrupt? Or by directly calling .close on the block writer? That should be thread-safe; it indirects through a concurrent queue.
When you say "during the test run", do you mean you're looking at the history while the test is still running? It only checkpoints every 16384 operations in the history, so if you haven't gotten that far the history will claim to be empty.
My bad, it even says so right in the commit Histories are chunked on disk into 16384-operation blocks
!
The test I am currently working on doesn't make it to 16k ops before it's able to invoke a panic in a db replica(s), quorum is lost, which then impairs the db clients to the point of no return.
I am working on making the Jepsen client more tolerant of a completely unresponsive system, and then updating the generators to nil out.
but for ^C handling, yes
I'll take a look.
Hmm, yeah generally Jepsen shouldn't crash when DBs do. You're killing it explicitly, I'm guessing? I think the general answer there is to put a timeout on client/invoke!
.
But saving more of the history on ^C is good too! I'd love to see this feature--if you don't build it I'm sure I will later. Heads-down on a gnarly research project at the moment for faster history reduction, or I'd go build it right now!
But saving more of the history on ^C is good too! I'd love to see this feature--if you don't build it I'm sure I will later.
Experimented with:
(with-thread-name "Jepsen shutdown hook"
(info "Downloading DB logs before JVM shutdown...")
(snarf-logs! ~test)
(store/update-symlinks! ~test)
; 👇 try a naive way to save the history and updated test
(store.format/write-test-with-history! (->> ~test :store :handle) ~test))
to no avail so it will have to remain aspirational for me for now.
Hmm, yeah generally Jepsen shouldn't crash when DBs do. You're killing it explicitly, I'm guessing? I think the general answer there is to put a timeout on
client/invoke!
.
The client, in zig, tightly tied to db, can panic at times which brings down the JVM. And Jepsen is so good at finding reasons to panic. 🙂
Heads-down on a gnarly research project at the moment for faster history reduction, or I'd go build it right now!
Going to close the issue as the history does stream, ^C was just a wish, and to help focus on your research.
Just noticed jepsen.history too!
Ahhh that does make sense! And yeah, the bit of code you want for sealing the history is to somehow invoke BigVectorBlockWriter.close. That block writer is held in jepsen.generator.interpreter. I'm not totally sure how to connect the plumbing from the shutdown hook to there, but that's what has to happen!
And yeah, jepsen.history is... ah, I'm really excited. Been wanting this for the better part of seven years! It's close! I'm tackling what I think is the hardest problem right now, then I can go back in and start speeding up checkers.
When using 0.2.8-SNAPSHOT from clojars:
Or installing locally from main:
Histories are not being streamed during the test run:
^C does snarf log files, but no histories:
test.jespen is readable and shows
:history nil
:test.jepsen has the magic bytes
JEPSEN0001
.Any guidance?