jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.68k stars 711 forks source link

follow-on to file-corruption-package, it needs a nemesis to move db-nodes usage from package generator to nemesis invoke #561

Closed nurturenature closed 1 year ago

nurturenature commented 1 year ago

A couple of months ago, I submitted the file-corruption-package. More usage shows it should be changed to better fit into the combined package and generator patterns.

The original file-corruption-package did move the random selection of files from the package generator into the bitflip and truncate-file nemeses while preserving their APIs. However, it left the usage of db-nodes in the generator.

file-corruption is the only package that uses db-nodes in the generator. All other packages emit a node-spec in the generator map for the nemesis to resolve with db-nodes.

I ran into this by using db/primaries as a leader-stalker, e.g. peek into the log on each node to find the highest view, and target that node for faulting. Waaay too heavy-weight to be in a generator.

The original PR tried to be minimal by only having a package and not a specific nemesis. This PR brings file-corruption inline with the other packages by adding a file-corruption-nemesis.

aphyr commented 1 year ago

I think this is OK! One thing I'm not quite sure about--and I've gone back and forth on this over time--is whether to put the "smarts" for targeting specific nodes/files/whatever into the nemesis or the generator. In recent years I've been leaning more towards the generator, because it makes it possible to read the history and know exactly what a nemesis will do. Also seems to make composition easier, because you can rewrite a generator a lot easier than you can reach into the guts of a nemesis. Thoughts?

aphyr commented 1 year ago

Ahh, I gotcha. I read this as a vector of a vector of the var and got confused.On Nov 16, 2022 13:59, Nurture Nature @.***> wrote: @nurturenature commented on this pull request.

In jepsen/src/jepsen/nemesis/combined.clj:

@@ -360,6 +360,45 @@ :fs #{:strobe-clock} :color "#A0E9E3"}}}))

+(defn file-corruption-nemesis

Tells codox it's a link:

Markdown docstrings also support wikilink-style relative links, for referencing other vars. Vars in the current namespace will be matched first, and then Codox will try and find a best match out of all the vars its documenting.

x does style as source code, but not a link. And bare leaves it as text. It's also why mentioning vectors can lead to odd formatting, [[not] [a] [link]] turns into [not a link]. I was thinking about doing a doc formatting only PR for jepsen.nemesis* before 0.2.8. Jepsen has good docs! Thanks!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

nurturenature commented 1 year ago

put the "smarts" for targeting specific nodes/files/whatever into the nemesis or the generator

In cases where the smarts involve on-nodes, exec, etc, don't think there's really a choice. It's too heavyweight to be in the generator. When a generator takes a lot of time, interacts with other parts of the system, and similar, things get really weird across Jepsen, not just that individual op.

leaning more towards the generator, because it makes it possible to read the history and know exactly what a nemesis will do

I too am always looking ahead in the log for the nemesis response to see what specifically happened.

seems to make composition easier, because you can rewrite a generator a lot easier than you can reach into the guts of a nemesis

Agreed. Generators are just right there.

aphyr commented 1 year ago

Ah, yeah, I gotcha--I've done IO in generators before but that is a synchronous loop and it'd block other ops. Makes sense. Btw if you're having trouble with this, one contributing factor might be that we invoke generators more often than their ops are actually used. I've been considering an optimization in gen/on-threads (et al) and the function implementation of gen/op where you'd check to make sure there's actually free threads in the context before asking for an op. I didn't instrument this in detail but I'd bet you it could cut a ton of calls to gen/op out of the generator path.I experimented with this but didn't actually see a perf improvement in my (simple) benchmark, I suspect because the saved calls to gen/op are offset by the new calls to find out if the context has no free threads. Bet you it'd improve things for IO generators though.

nurturenature commented 1 year ago

Hi,

I think this is OK!

Re-raising.

It may have been missed for 0.3.1?

Thanks!

(P.S. Just upgraded a test from 0.2.8 to 0.3.2-SNAPSHOT and no changes were required.)

aphyr commented 1 year ago

Yes, pardon me! I'm... god I'm juggling so many things right now, I'm sorry. Thanks for re-raising. :-)