oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

saga recovery failed due to incomplete log due to pagination bug #5948

Closed davepacheco closed 4 months ago

davepacheco commented 4 months ago

Following the story of #5947: I went to dig into this error message:

00:46:47.232Z WARN 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to recover saga 7dce104f-0806-4906-a8b6-267f18379e21: Internal Error: failed to resume saga: recovery for saga 7dce104f-0806-4906-a8b6-267f18379e21: node NodeIndex(0): load status is "Succeeded(String("f7934b34-1e6e-4fa2-8cc5-e2456508d45f"))", which is illegal for parent load status "NeverStarted"

Here, Steno is reporting an internal error recovering the saga because its log seems to reflect an impossible state. Yikes -- this sounds scary. What's the state of this saga?

root@[fd00:1122:3344:108::3]:32221/omicron> select * from saga where id = '7dce104f-0806-4906-a8b6-267f18379e21';
-[ RECORD 1 ]
id               | 7dce104f-0806-4906-a8b6-267f18379e21
creator          | 65a11c18-7f59-41ac-b9e7-680627f996e7
time_created     | 2024-02-08 16:12:26.67964+00
name             | instance-create
saga_dag         | ...
saga_state       | running
current_sec      | 65a11c18-7f59-41ac-b9e7-680627f996e7
adopt_generation | 1
adopt_time       | 2024-02-08 16:12:26.67964+00

I trimmed saga_dag out of that output because it's enormous and turned out not to be relevant. But I did wind up digging into it a bit. Here's the log of saga events, ignoring the data produced by each node:

root@[fd00:1122:3344:108::3]:32221/omicron> select event_time,node_id,event_type from saga_node_event where saga_id = '7dce104f-0806-4906-a8b6-267f18379e21';
           event_time           | node_id | event_type
--------------------------------+---------+-------------
  2024-02-08 16:12:26.712324+00 |       0 | started
  2024-02-08 16:12:26.714392+00 |       0 | succeeded
  2024-02-08 16:12:26.716458+00 |       1 | started
  2024-02-08 16:12:26.762434+00 |       1 | succeeded
  2024-02-08 16:12:26.764671+00 |       2 | started
  2024-02-08 16:12:26.79708+00  |       2 | succeeded
  2024-02-08 16:12:26.799201+00 |       3 | started
  2024-02-08 16:12:29.630453+00 |       3 | succeeded
  2024-02-08 16:12:30.091878+00 |       4 | started
  2024-02-08 16:12:30.094589+00 |       4 | succeeded
  2024-02-08 16:12:30.096826+00 |       5 | started
  2024-02-08 16:12:30.203381+00 |       5 | succeeded
  2024-02-08 16:12:30.20559+00  |       6 | started
  2024-02-08 16:12:30.207753+00 |       6 | succeeded
  2024-02-08 16:12:30.209735+00 |       7 | started
  2024-02-08 16:12:30.211783+00 |       7 | succeeded
  2024-02-08 16:12:30.214181+00 |       8 | started
  2024-02-08 16:12:30.216157+00 |       8 | succeeded
  2024-02-08 16:12:30.218013+00 |       9 | started
  2024-02-08 16:12:30.219947+00 |       9 | succeeded
  2024-02-08 16:12:30.221934+00 |      10 | started
  2024-02-08 16:12:30.22368+00  |      10 | succeeded
  2024-02-08 16:12:30.225822+00 |      11 | started
  2024-02-08 16:12:30.227877+00 |      11 | succeeded
  2024-02-08 16:12:30.23009+00  |      12 | started
  2024-02-08 16:12:30.232017+00 |      12 | succeeded
  2024-02-08 16:12:30.233914+00 |      13 | started
  2024-02-08 16:12:30.235966+00 |      13 | succeeded
  2024-02-08 16:12:30.237981+00 |      14 | started
  2024-02-08 16:12:30.239991+00 |      14 | succeeded
  2024-02-08 16:12:30.241852+00 |      15 | started
  2024-02-08 16:12:30.244071+00 |      15 | succeeded
  2024-02-08 16:12:30.246213+00 |      16 | started
  2024-02-08 16:12:30.248214+00 |      16 | succeeded
  2024-02-08 16:12:30.250175+00 |      17 | started
  2024-02-08 16:12:30.252159+00 |      17 | succeeded
  2024-02-08 16:12:30.254117+00 |      18 | started
  2024-02-08 16:12:30.256072+00 |      18 | succeeded
  2024-02-08 16:12:30.25808+00  |      19 | started
  2024-02-08 16:12:30.26002+00  |      19 | succeeded
  2024-02-08 16:12:30.262301+00 |      20 | started
  2024-02-08 16:12:30.264098+00 |      20 | succeeded
  2024-02-08 16:12:30.265843+00 |      21 | started
  2024-02-08 16:12:30.267972+00 |      21 | succeeded
  2024-02-08 16:12:30.26979+00  |      22 | started
  2024-02-08 16:12:30.271608+00 |      22 | succeeded
  2024-02-08 16:12:30.273571+00 |      23 | started
  2024-02-08 16:12:30.275477+00 |      23 | succeeded
  2024-02-08 16:12:30.277705+00 |      24 | started
  2024-02-08 16:12:30.279513+00 |      24 | succeeded
  2024-02-08 16:12:30.281332+00 |      25 | started
  2024-02-08 16:12:30.283178+00 |      25 | succeeded
  2024-02-08 16:12:30.284962+00 |      26 | started
  2024-02-08 16:12:30.286979+00 |      26 | succeeded
  2024-02-08 16:12:30.28884+00  |      27 | started
  2024-02-08 16:12:30.290806+00 |      27 | succeeded
  2024-02-08 16:12:30.293019+00 |      28 | started
  2024-02-08 16:12:30.294892+00 |      28 | succeeded
  2024-02-08 16:12:30.296806+00 |      29 | started
  2024-02-08 16:12:30.298742+00 |      29 | succeeded
  2024-02-08 16:12:30.300649+00 |      30 | started
  2024-02-08 16:12:30.302622+00 |      30 | succeeded
  2024-02-08 16:12:30.304562+00 |      31 | started
  2024-02-08 16:12:30.306452+00 |      31 | succeeded
  2024-02-08 16:12:30.308577+00 |      32 | started
  2024-02-08 16:12:30.310384+00 |      32 | succeeded
  2024-02-08 16:12:30.312371+00 |      33 | started
  2024-02-08 16:12:30.314228+00 |      33 | succeeded
  2024-02-08 16:12:30.316086+00 |      34 | started
  2024-02-08 16:12:30.318122+00 |      34 | succeeded
  2024-02-08 16:12:30.319904+00 |      35 | started
  2024-02-08 16:12:30.321736+00 |      35 | succeeded
  2024-02-08 16:12:30.323899+00 |      36 | started
  2024-02-08 16:12:30.44017+00  |      36 | succeeded
  2024-02-08 16:12:30.442165+00 |      37 | started
  2024-02-08 16:12:30.444146+00 |      37 | succeeded
  2024-02-08 16:12:30.446352+00 |      38 | started
  2024-02-08 16:12:30.44827+00  |      38 | succeeded
  2024-02-08 16:12:30.45021+00  |      39 | started
  2024-02-08 16:12:40.236154+00 |      39 | succeeded
  2024-02-08 16:12:40.238705+00 |      40 | started
  2024-02-08 16:12:40.240896+00 |      40 | succeeded
  2024-02-08 16:12:40.243156+00 |      41 | started
  2024-02-08 16:12:40.245407+00 |      41 | succeeded
  2024-02-08 16:12:40.247801+00 |      42 | started
  2024-02-08 16:12:40.24974+00  |      42 | succeeded
  2024-02-08 16:12:40.251857+00 |      43 | started
  2024-02-08 16:12:40.253693+00 |      43 | succeeded
  2024-02-08 16:12:40.255688+00 |      44 | started
  2024-02-08 16:12:40.25762+00  |      44 | succeeded
  2024-02-08 16:12:40.259488+00 |      45 | started
  2024-02-08 16:12:40.26173+00  |      45 | succeeded
  2024-02-08 16:12:40.26461+00  |      46 | started
  2024-02-08 16:12:40.266653+00 |      46 | succeeded
  2024-02-08 16:12:40.268581+00 |      47 | started
  2024-02-08 16:12:40.270533+00 |      47 | succeeded
  2024-02-08 16:12:40.272566+00 |      48 | started
  2024-02-08 16:12:40.274824+00 |      48 | succeeded
  2024-02-08 16:12:40.276714+00 |      49 | started
  2024-02-08 16:12:40.278746+00 |      49 | succeeded
  2024-02-08 16:12:40.28106+00  |      50 | started
  2024-02-08 16:12:40.283092+00 |      50 | succeeded
  2024-02-08 16:12:40.284929+00 |      51 | started
  2024-02-08 16:12:40.287087+00 |      51 | succeeded
  2024-02-08 16:12:40.289033+00 |      52 | started
  2024-02-08 16:12:40.29088+00  |      52 | succeeded
  2024-02-08 16:12:40.292814+00 |      53 | started
  2024-02-08 16:12:40.294738+00 |      53 | succeeded
  2024-02-08 16:12:40.297217+00 |      54 | started
  2024-02-08 16:12:40.299128+00 |      54 | succeeded
  2024-02-08 16:12:40.301067+00 |      55 | started
  2024-02-08 16:12:40.302961+00 |      55 | succeeded
  2024-02-08 16:12:40.304831+00 |      56 | started
  2024-02-08 16:12:40.306614+00 |      56 | succeeded
  2024-02-08 16:12:40.308809+00 |      57 | started
  2024-02-08 16:12:40.310952+00 |      57 | succeeded
  2024-02-08 16:12:40.31312+00  |      58 | started
  2024-02-08 16:12:40.315168+00 |      58 | succeeded
  2024-02-08 16:12:40.317021+00 |      59 | started
  2024-02-08 16:12:40.318862+00 |      59 | succeeded
  2024-02-08 16:12:40.32084+00  |      60 | started
  2024-02-08 16:12:40.322798+00 |      60 | succeeded
  2024-02-08 16:12:40.324903+00 |      61 | started
  2024-02-08 16:12:40.326817+00 |      61 | succeeded
  2024-02-08 16:12:40.329095+00 |      62 | started
  2024-02-08 16:12:40.331087+00 |      62 | succeeded
  2024-02-08 16:12:40.333072+00 |      63 | started
  2024-02-08 16:12:40.334957+00 |      63 | succeeded
  2024-02-08 16:12:40.336829+00 |      64 | started
  2024-02-08 16:12:40.338741+00 |      64 | succeeded
  2024-02-08 16:12:40.340541+00 |      65 | started
  2024-02-08 16:12:40.34234+00  |      65 | succeeded
  2024-02-08 16:12:40.3447+00   |      66 | started
  2024-02-08 16:12:40.346562+00 |      66 | succeeded
  2024-02-08 16:12:40.348413+00 |      67 | started
  2024-02-08 16:12:40.350398+00 |      67 | succeeded
  2024-02-08 16:12:40.352359+00 |      68 | started
  2024-02-08 16:12:40.354266+00 |      68 | succeeded
  2024-02-08 16:12:40.35617+00  |      69 | started
  2024-02-08 16:12:40.358193+00 |      69 | succeeded
  2024-02-08 16:12:40.360285+00 |      70 | started
  2024-02-08 16:12:40.362226+00 |      70 | succeeded
  2024-02-08 16:12:40.364301+00 |      71 | started
  2024-02-08 16:12:40.366185+00 |      71 | succeeded
  2024-02-08 16:12:40.368252+00 |      72 | started
  2024-02-08 16:12:40.370282+00 |      72 | succeeded
  2024-02-08 16:12:26.70058+00  |     182 | started
  2024-02-08 16:12:26.709765+00 |     182 | succeeded
(148 rows)

This is complicated enough that I wanted to visualize it. But first I wanted to better understand the specific problem that's being reported. I started working through the recovery code, which involves the Nexus side calling into Steno to resume the saga. That lands in cmd_saga_resume inside Steno. There's a log message there -- and it's in the Nexus log! Great.

So what's the error saying? That ends up coming from here, where we create an executor to resume running the saga. It's saying: node 0 has an ancestor (i.e., it depends on another node), and that node hasn't started -- yet node 0 supposedly finished. This was surprising to me on a few levels: obviously something shouldn't be able to start if its dependency didn't finish, but also, isn't node 0 the start node?

I confirmed from a similar query above that includes the "data" field that node 0 really did start and finish -- with data and everything:

  2024-02-08 16:12:26.712324+00 |       0 | started    | NULL
  2024-02-08 16:12:26.714392+00 |       0 | succeeded  | "f7934b34-1e6e-4fa2-8cc5-e2456508d45f"

But I still didn't get why this had any incoming edges.

Aside: trying to visualize this saga's DAG

I thought it'd be useful to visualize the saga's DAG and I know Steno has a way to print a dot graph. Fortunately, the "saga resume" log message includes the serialized SagaDag. I saved this to a file. Steno doesn't have a tool for reading this in, but I made one pretty easily, calling it examples/saga-dump.rs in a steno clone:

//! Dump information about a saga from serialized state

use anyhow::anyhow;
use anyhow::Context;
use steno::SagaDag;

fn main() -> anyhow::Result<()> {
    let args = std::env::args().collect::<Vec<_>>();
    let filename = args.get(1).ok_or_else(|| anyhow!("expected filename"))?;
    let data_str = std::fs::read_to_string(&filename)
        .with_context(|| format!("read {:?}", &filename))?;
    let dag: SagaDag = serde_json::from_str(&data_str)
        .with_context(|| format!("parse {:?}", &filename))?;
    println!("{}", dag.dot());
    Ok(())
}

I ran it as cargo run --example=saga-dump -- dag.json > dag.dot, then converted that to a PNG with dot -o dag.png -Tpng dag.dot. This took surprisingly long -- like 10+ seconds to run dot! I thought it was broken, but it's just that this is a huge DAG. I did manage to open it in Preview, but this wasn't that helpful -- it's a huge, pretty linear graph. Still, the dot file was pretty useful as a much clearer way to view the nodes and their dependencies.

Resetting a bit: I'm trying to figure out what points to node 0? And this is easily answered by the dot file, which ends with:

    179 -> 180 [ ]
    180 -> 181 [ ]
    182 -> 0 [ ]
    181 -> 183 [ ]

Node 182 points to node 0? And then I noticed this in the DAG:

"start_node":182

Okay :facepalm: I just misremembered how this worked. Steno first creates all the nodes that the consumer gave it, then creates the start node (182 in this case) and that becomes a dependency of the first user-created node (which is thus 0). So fine -- we understand now why node 0 has a dependency. But then: what's the state of node 182? Did it finish or not? Steno reports its status is NeverStarted -- how did it get that?

At this point I was worried that saga recovery was completely broken, so I did a quick check using Steno's example. First, I did a full run and saved the log:

$ cargo run --example=demo-provision run --dump-to mylog
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.00s
     Running `target/debug/examples/demo-provision run --dump-to mylog`
*** running saga ***
running action: InstanceCreate (instance name: fake-o instance)
running action: VpcAllocIp
running action: ServerPick
running action: VolumeCreate
running action: ServerReserve
running action: InstanceConfigure
running action: VolumeAttach
running action: InstanceBoot
running action: Print
printing final state:
  instance id: 1211
  IP address: 10.120.121.122
  volume id: 1213
  server id: 1212
*** finished saga ***

*** final state ***
+ saga execution: 049b2522-308d-442e-bc65-9bfaef863597
+-- done: (start node)
+-- done: InstanceCreate (produces "instance_id")
+-- done: (constant = {"number_of_things":1}) (produces "server_alloc_params")
+-- (parallel actions):
        +-- done: VpcAllocIp (produces "instance_ip")
        +-- done: VolumeCreate (produces "volume_id")
        +-- done: (subsaga start: "server-alloc")
                +-- done: ServerPick (produces "server_id")
                +-- done: ServerReserve (produces "server_reserve")
        +-- done: (subsaga end) (produces "server_alloc")
+-- done: InstanceConfigure (produces "instance_configure")
+-- done: VolumeAttach (produces "volume_attach")
+-- done: InstanceBoot (produces "instance_boot")
+-- done: Print (produces "print")
+-- done: (end node)

result: SUCCESS
final output: "it worked"
dumped log to "mylog"

Then I manually chopped off the last few entries of the log (so that it would appear like node 11 was in-progress and node 13 hadn't run) and recovered it:

dap@zathras steno $ cargo run --example=demo-provision run --recover-from mylog 
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/examples/demo-provision run --recover-from mylog`
recovering from log: mylog
recovered state
+ saga execution: 049b2522-308d-442e-bc65-9bfaef863597
+-- done: (start node)
+-- done: InstanceCreate (produces "instance_id")
+-- done: (constant = {"number_of_things":1}) (produces "server_alloc_params")
+-- (parallel actions):
        +-- done: VpcAllocIp (produces "instance_ip")
        +-- done: VolumeCreate (produces "volume_id")
        +-- done: (subsaga start: "server-alloc")
                +-- done: ServerPick (produces "server_id")
                +-- done: ServerReserve (produces "server_reserve")
        +-- done: (subsaga end) (produces "server_alloc")
+-- done: InstanceConfigure (produces "instance_configure")
+-- done: VolumeAttach (produces "volume_attach")
+-- done: InstanceBoot (produces "instance_boot")
+-- queued-todo: Print (produces "print")
+-- blocked: (end node)

*** running saga ***
running action: Print
printing final state:
  instance id: 1211
  IP address: 10.120.121.122
  volume id: 1213
  server id: 1212
*** finished saga ***

*** final state ***
+ saga execution: 049b2522-308d-442e-bc65-9bfaef863597
+-- done: (start node)
+-- done: InstanceCreate (produces "instance_id")
+-- done: (constant = {"number_of_things":1}) (produces "server_alloc_params")
+-- (parallel actions):
        +-- done: VpcAllocIp (produces "instance_ip")
        +-- done: VolumeCreate (produces "volume_id")
        +-- done: (subsaga start: "server-alloc")
                +-- done: ServerPick (produces "server_id")
                +-- done: ServerReserve (produces "server_reserve")
        +-- done: (subsaga end) (produces "server_alloc")
+-- done: InstanceConfigure (produces "instance_configure")
+-- done: VolumeAttach (produces "volume_attach")
+-- done: InstanceBoot (produces "instance_boot")
+-- done: Print (produces "print")
+-- done: (end node)

result: SUCCESS
final output: "it worked"

Good -- that's correct. In particular, we can see that the loaded state reflects that the second-to-last node hasn't finished, yet it finishes by the end. So this isn't totally broken.

I also noticed about this example that:

That led me to ask: do we have log entries for node 182 in our saga's log? We do! (See above.) Though it's interesting that they appear relatively late in the log. Now, while looking at the code I noticed that Steno sorts the log semantically (not by timestamp) to make it easier to iterate them in an order that will make sense. Is that somehow not accounting for the start node? After some time with that code, it didn't seem like that should matter: all it depends on is having seen the "start" for a node before it's "finished", and that should be the case here even if the start node winds up being processed last.

I think it was about this time I wondered if somehow we hadn't loaded the log entries for node 182 from the database. I looked at the code that fetched the saga_node_event rows... and spotted the bug: when it fetches the log, the pagination code is hand-rolled (it long predates Paginator) -- and it incorrectly paginates using the saga_id as the marker. But the saga_id for all of these entries is the same! So the behavior you'd expect from this code is:

We want to be paginating using the node id (and probably the event type, to ensure uniqueness) as the marker instead.

Does this actually explain the behavior? First, of all, do we even have more than one page of rows? We do: the page size is 100 and there are 148 rows in the output above. For this explanation to work, node 182's rows would have to not be in the first page, which is a little surprising because at least PostgreSQL tends to sort by insertion order (though it's not guaranteed if the ORDER BY doesn't specify). But we can see from the output above that even though node 182's timestamps are the earliest, it appears last by the natural sort.

Still, I'd feel better if I could see the problem in action using this same database. So I wrote a little omdb command that basically does the following:

This basically lets me do the same thing the recovery path would do via omdb, where I can easily print stuff out. I could also fix the implementation and run the same thing and verify that it works now. (I haven't done that part yet.)

Here's the source for the omdb command:

```diff diff --git a/dev-tools/omdb/src/bin/omdb/db.rs b/dev-tools/omdb/src/bin/omdb/db.rs index 7df457b24..c2993e068 100644 --- a/dev-tools/omdb/src/bin/omdb/db.rs +++ b/dev-tools/omdb/src/bin/omdb/db.rs @@ -41,6 +41,7 @@ use diesel::JoinOnDsl; use diesel::NullableExpressionMethods; use diesel::OptionalExtension; use diesel::TextExpressionMethods; +use dropshot::PaginationOrder; use gateway_client::types::SpType; use indicatif::ProgressBar; use indicatif::ProgressDrawTarget; @@ -302,6 +303,9 @@ enum DbCommands { Snapshots(SnapshotArgs), /// Validate the contents of the database Validate(ValidateArgs), + + /// dap debugging command to dump the saga log for a saga + DumpSagaLog(DumpSagaLogArgs), } #[derive(Debug, Args)] @@ -552,6 +556,11 @@ enum ValidateCommands { ValidateRegionSnapshots, } +#[derive(Debug, Args)] +struct DumpSagaLogArgs { + saga_id: Uuid, +} + impl DbArgs { /// Run a `omdb db` subcommand. pub(crate) async fn run_cmd( @@ -681,6 +690,9 @@ impl DbArgs { DbCommands::Validate(ValidateArgs { command: ValidateCommands::ValidateRegionSnapshots, }) => cmd_db_validate_region_snapshots(&datastore).await, + DbCommands::DumpSagaLog(args) => { + cmd_db_dump_saga_log(&opctx, &datastore, args).await + } } } } @@ -3686,3 +3698,57 @@ async fn cmd_db_reconfigurator_save( eprintln!("wrote {}", output_path); Ok(()) } + +// dap + +async fn cmd_db_dump_saga_log( + _opctx: &OpContext, + datastore: &DataStore, + dump_saga_log_args: &DumpSagaLogArgs, +) -> Result<(), anyhow::Error> { + let saga_id = + nexus_db_model::saga_types::SagaId(dump_saga_log_args.saga_id.into()); + let limit = NonZeroU32::new(100).unwrap(); + let direction = PaginationOrder::Ascending; + let pagparams = DataPageParams { marker: None, direction, limit }; + println!("initial marker: {:?}", pagparams.marker); + println!("initial limit: {}", pagparams.limit); + let events = datastore + .saga_node_event_list_by_id(saga_id, &pagparams) + .await + .with_context(|| { + format!("listing saga nodes for saga {}", saga_id.0.0) + })?; + println!( + "saga {} node events found (page 1): {}", + saga_id.0.0, + events.len() + ); + for event in &events { + assert_eq!(event.saga_id.0, saga_id.0.0); + println!(" event: node {:>3} {}", event.node_id, event.event_type); + } + + println!("INCORRECTLY choosing marker the same way as existing code"); + let marker = + events.last().ok_or_else(|| anyhow!("found no events"))?.saga_id.0; + let pagparams = DataPageParams { marker: Some(&marker), direction, limit }; + println!("new marker: {:?}", marker); + println!("new limit: {}", limit); + let events = datastore + .saga_node_event_list_by_id(saga_id, &pagparams) + .await + .with_context(|| { + format!("listing saga nodes for saga {}", saga_id.0.0) + })?; + println!( + "saga {} node events found (page 2): {}", + saga_id.0.0, + events.len() + ); + for event in &events { + assert_eq!(event.saga_id.0, saga_id.0.0); + println!(" event: node {:>3} {}", event.node_id, event.event_type); + } + Ok(()) +} ```

and here's the output:

root@oxz_switch1:~# /var/tmp/omdb-dap-saga db dump-saga-log 7dce104f-0806-4906-a8b6-267f18379e21
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:109::3]:32221,[fd00:1122:3344:105::3]:32221,[fd00:1122:3344:10b::3]:32221,[fd00:1122:3344:107::3]:32221,[fd00:1122:3344:108::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (77.0.0)
initial marker: None
initial limit:  100
saga 7dce104f-0806-4906-a8b6-267f18379e21 node events found (page 1): 100
  event: node   0 started
  event: node   0 succeeded
  event: node   1 started
  event: node   1 succeeded
  event: node   2 started
  event: node   2 succeeded
  event: node   3 started
  event: node   3 succeeded
  event: node   4 started
  event: node   4 succeeded
  event: node   5 started
  event: node   5 succeeded
  event: node   6 started
  event: node   6 succeeded
  event: node   7 started
  event: node   7 succeeded
  event: node   8 started
  event: node   8 succeeded
  event: node   9 started
  event: node   9 succeeded
  event: node  10 started
  event: node  10 succeeded
  event: node  11 started
  event: node  11 succeeded
  event: node  12 started
  event: node  12 succeeded
  event: node  13 started
  event: node  13 succeeded
  event: node  14 started
  event: node  14 succeeded
  event: node  15 started
  event: node  15 succeeded
  event: node  16 started
  event: node  16 succeeded
  event: node  17 started
  event: node  17 succeeded
  event: node  18 started
  event: node  18 succeeded
  event: node  19 started
  event: node  19 succeeded
  event: node  20 started
  event: node  20 succeeded
  event: node  21 started
  event: node  21 succeeded
  event: node  22 started
  event: node  22 succeeded
  event: node  23 started
  event: node  23 succeeded
  event: node  24 started
  event: node  24 succeeded
  event: node  25 started
  event: node  25 succeeded
  event: node  26 started
  event: node  26 succeeded
  event: node  27 started
  event: node  27 succeeded
  event: node  28 started
  event: node  28 succeeded
  event: node  29 started
  event: node  29 succeeded
  event: node  30 started
  event: node  30 succeeded
  event: node  31 started
  event: node  31 succeeded
  event: node  32 started
  event: node  32 succeeded
  event: node  33 started
  event: node  33 succeeded
  event: node  34 started
  event: node  34 succeeded
  event: node  35 started
  event: node  35 succeeded
  event: node  36 started
  event: node  36 succeeded
  event: node  37 started
  event: node  37 succeeded
  event: node  38 started
  event: node  38 succeeded
  event: node  39 started
  event: node  39 succeeded
  event: node  40 started
  event: node  40 succeeded
  event: node  41 started
  event: node  41 succeeded
  event: node  42 started
  event: node  42 succeeded
  event: node  43 started
  event: node  43 succeeded
  event: node  44 started
  event: node  44 succeeded
  event: node  45 started
  event: node  45 succeeded
  event: node  46 started
  event: node  46 succeeded
  event: node  47 started
  event: node  47 succeeded
  event: node  48 started
  event: node  48 succeeded
  event: node  49 started
  event: node  49 succeeded
INCORRECTLY choosing marker the same way as existing code
new marker: 7dce104f-0806-4906-a8b6-267f18379e21
new limit:  100
saga 7dce104f-0806-4906-a8b6-267f18379e21 node events found (page 2): 0
root@oxz_switch1:~#

This is pretty close to a smoking gun. This shows that:

I also reproduced from SQL by first listing everything in timestamp order to show that it looks like what we'd expect:

root@[fd00:1122:3344:108::3]:32221/omicron> select event_time,node_id,event_type from saga_node_event where saga_id = '7dce104f-0806-4906-a8b6-267f18379e21' ORDER BY event_time;
           event_time           | node_id | event_type
--------------------------------+---------+-------------
  2024-02-08 16:12:26.70058+00  |     182 | started
  2024-02-08 16:12:26.709765+00 |     182 | succeeded
  2024-02-08 16:12:26.712324+00 |       0 | started
  2024-02-08 16:12:26.714392+00 |       0 | succeeded
  2024-02-08 16:12:26.716458+00 |       1 | started
  2024-02-08 16:12:26.762434+00 |       1 | succeeded
  2024-02-08 16:12:26.764671+00 |       2 | started
  2024-02-08 16:12:26.79708+00  |       2 | succeeded
  2024-02-08 16:12:26.799201+00 |       3 | started
  2024-02-08 16:12:29.630453+00 |       3 | succeeded
  2024-02-08 16:12:30.091878+00 |       4 | started
  2024-02-08 16:12:30.094589+00 |       4 | succeeded
  2024-02-08 16:12:30.096826+00 |       5 | started
  2024-02-08 16:12:30.203381+00 |       5 | succeeded
  2024-02-08 16:12:30.20559+00  |       6 | started
  2024-02-08 16:12:30.207753+00 |       6 | succeeded
  2024-02-08 16:12:30.209735+00 |       7 | started
  2024-02-08 16:12:30.211783+00 |       7 | succeeded
  2024-02-08 16:12:30.214181+00 |       8 | started
  2024-02-08 16:12:30.216157+00 |       8 | succeeded
  2024-02-08 16:12:30.218013+00 |       9 | started
  2024-02-08 16:12:30.219947+00 |       9 | succeeded
  2024-02-08 16:12:30.221934+00 |      10 | started
  2024-02-08 16:12:30.22368+00  |      10 | succeeded
  2024-02-08 16:12:30.225822+00 |      11 | started
  2024-02-08 16:12:30.227877+00 |      11 | succeeded
  2024-02-08 16:12:30.23009+00  |      12 | started
  2024-02-08 16:12:30.232017+00 |      12 | succeeded
  2024-02-08 16:12:30.233914+00 |      13 | started
  2024-02-08 16:12:30.235966+00 |      13 | succeeded
  2024-02-08 16:12:30.237981+00 |      14 | started
  2024-02-08 16:12:30.239991+00 |      14 | succeeded
  2024-02-08 16:12:30.241852+00 |      15 | started
  2024-02-08 16:12:30.244071+00 |      15 | succeeded
  2024-02-08 16:12:30.246213+00 |      16 | started
  2024-02-08 16:12:30.248214+00 |      16 | succeeded
  2024-02-08 16:12:30.250175+00 |      17 | started
  2024-02-08 16:12:30.252159+00 |      17 | succeeded
  2024-02-08 16:12:30.254117+00 |      18 | started
  2024-02-08 16:12:30.256072+00 |      18 | succeeded
  2024-02-08 16:12:30.25808+00  |      19 | started
  2024-02-08 16:12:30.26002+00  |      19 | succeeded
  2024-02-08 16:12:30.262301+00 |      20 | started
  2024-02-08 16:12:30.264098+00 |      20 | succeeded
  2024-02-08 16:12:30.265843+00 |      21 | started
  2024-02-08 16:12:30.267972+00 |      21 | succeeded
  2024-02-08 16:12:30.26979+00  |      22 | started
  2024-02-08 16:12:30.271608+00 |      22 | succeeded
  2024-02-08 16:12:30.273571+00 |      23 | started
  2024-02-08 16:12:30.275477+00 |      23 | succeeded
  2024-02-08 16:12:30.277705+00 |      24 | started
  2024-02-08 16:12:30.279513+00 |      24 | succeeded
  2024-02-08 16:12:30.281332+00 |      25 | started
  2024-02-08 16:12:30.283178+00 |      25 | succeeded
  2024-02-08 16:12:30.284962+00 |      26 | started
  2024-02-08 16:12:30.286979+00 |      26 | succeeded
  2024-02-08 16:12:30.28884+00  |      27 | started
  2024-02-08 16:12:30.290806+00 |      27 | succeeded
  2024-02-08 16:12:30.293019+00 |      28 | started
  2024-02-08 16:12:30.294892+00 |      28 | succeeded
  2024-02-08 16:12:30.296806+00 |      29 | started
  2024-02-08 16:12:30.298742+00 |      29 | succeeded
  2024-02-08 16:12:30.300649+00 |      30 | started
  2024-02-08 16:12:30.302622+00 |      30 | succeeded
  2024-02-08 16:12:30.304562+00 |      31 | started
  2024-02-08 16:12:30.306452+00 |      31 | succeeded
  2024-02-08 16:12:30.308577+00 |      32 | started
  2024-02-08 16:12:30.310384+00 |      32 | succeeded
  2024-02-08 16:12:30.312371+00 |      33 | started
  2024-02-08 16:12:30.314228+00 |      33 | succeeded
  2024-02-08 16:12:30.316086+00 |      34 | started
  2024-02-08 16:12:30.318122+00 |      34 | succeeded
  2024-02-08 16:12:30.319904+00 |      35 | started
  2024-02-08 16:12:30.321736+00 |      35 | succeeded
  2024-02-08 16:12:30.323899+00 |      36 | started
  2024-02-08 16:12:30.44017+00  |      36 | succeeded
  2024-02-08 16:12:30.442165+00 |      37 | started
  2024-02-08 16:12:30.444146+00 |      37 | succeeded
  2024-02-08 16:12:30.446352+00 |      38 | started
  2024-02-08 16:12:30.44827+00  |      38 | succeeded
  2024-02-08 16:12:30.45021+00  |      39 | started
  2024-02-08 16:12:40.236154+00 |      39 | succeeded
  2024-02-08 16:12:40.238705+00 |      40 | started
  2024-02-08 16:12:40.240896+00 |      40 | succeeded
  2024-02-08 16:12:40.243156+00 |      41 | started
  2024-02-08 16:12:40.245407+00 |      41 | succeeded
  2024-02-08 16:12:40.247801+00 |      42 | started
  2024-02-08 16:12:40.24974+00  |      42 | succeeded
  2024-02-08 16:12:40.251857+00 |      43 | started
  2024-02-08 16:12:40.253693+00 |      43 | succeeded
  2024-02-08 16:12:40.255688+00 |      44 | started
  2024-02-08 16:12:40.25762+00  |      44 | succeeded
  2024-02-08 16:12:40.259488+00 |      45 | started
  2024-02-08 16:12:40.26173+00  |      45 | succeeded
  2024-02-08 16:12:40.26461+00  |      46 | started
  2024-02-08 16:12:40.266653+00 |      46 | succeeded
  2024-02-08 16:12:40.268581+00 |      47 | started
  2024-02-08 16:12:40.270533+00 |      47 | succeeded
  2024-02-08 16:12:40.272566+00 |      48 | started
  2024-02-08 16:12:40.274824+00 |      48 | succeeded
  2024-02-08 16:12:40.276714+00 |      49 | started
  2024-02-08 16:12:40.278746+00 |      49 | succeeded
  2024-02-08 16:12:40.28106+00  |      50 | started
  2024-02-08 16:12:40.283092+00 |      50 | succeeded
  2024-02-08 16:12:40.284929+00 |      51 | started
  2024-02-08 16:12:40.287087+00 |      51 | succeeded
  2024-02-08 16:12:40.289033+00 |      52 | started
  2024-02-08 16:12:40.29088+00  |      52 | succeeded
  2024-02-08 16:12:40.292814+00 |      53 | started
  2024-02-08 16:12:40.294738+00 |      53 | succeeded
  2024-02-08 16:12:40.297217+00 |      54 | started
  2024-02-08 16:12:40.299128+00 |      54 | succeeded
  2024-02-08 16:12:40.301067+00 |      55 | started
  2024-02-08 16:12:40.302961+00 |      55 | succeeded
  2024-02-08 16:12:40.304831+00 |      56 | started
  2024-02-08 16:12:40.306614+00 |      56 | succeeded
  2024-02-08 16:12:40.308809+00 |      57 | started
  2024-02-08 16:12:40.310952+00 |      57 | succeeded
  2024-02-08 16:12:40.31312+00  |      58 | started
  2024-02-08 16:12:40.315168+00 |      58 | succeeded
  2024-02-08 16:12:40.317021+00 |      59 | started
  2024-02-08 16:12:40.318862+00 |      59 | succeeded
  2024-02-08 16:12:40.32084+00  |      60 | started
  2024-02-08 16:12:40.322798+00 |      60 | succeeded
  2024-02-08 16:12:40.324903+00 |      61 | started
  2024-02-08 16:12:40.326817+00 |      61 | succeeded
  2024-02-08 16:12:40.329095+00 |      62 | started
  2024-02-08 16:12:40.331087+00 |      62 | succeeded
  2024-02-08 16:12:40.333072+00 |      63 | started
  2024-02-08 16:12:40.334957+00 |      63 | succeeded
  2024-02-08 16:12:40.336829+00 |      64 | started
  2024-02-08 16:12:40.338741+00 |      64 | succeeded
  2024-02-08 16:12:40.340541+00 |      65 | started
  2024-02-08 16:12:40.34234+00  |      65 | succeeded
  2024-02-08 16:12:40.3447+00   |      66 | started
  2024-02-08 16:12:40.346562+00 |      66 | succeeded
  2024-02-08 16:12:40.348413+00 |      67 | started
  2024-02-08 16:12:40.350398+00 |      67 | succeeded
  2024-02-08 16:12:40.352359+00 |      68 | started
  2024-02-08 16:12:40.354266+00 |      68 | succeeded
  2024-02-08 16:12:40.35617+00  |      69 | started
  2024-02-08 16:12:40.358193+00 |      69 | succeeded
  2024-02-08 16:12:40.360285+00 |      70 | started
  2024-02-08 16:12:40.362226+00 |      70 | succeeded
  2024-02-08 16:12:40.364301+00 |      71 | started
  2024-02-08 16:12:40.366185+00 |      71 | succeeded
  2024-02-08 16:12:40.368252+00 |      72 | started
  2024-02-08 16:12:40.370282+00 |      72 | succeeded
(148 rows)

Notice that node 182 shows up first as we'd expect. If we limit to the first 100, we get it, and that's what we expect. If we order by saga_id instead, which is what it looks like the code would do:

root@[fd00:1122:3344:108::3]:32221/omicron> select event_time,node_id,event_type from saga_node_event where saga_id = '7dce104f-0806-4906-a8b6-267f18379e21' ORDER BY saga_id;
           event_time           | node_id | event_type
--------------------------------+---------+-------------
  2024-02-08 16:12:26.712324+00 |       0 | started
  2024-02-08 16:12:26.714392+00 |       0 | succeeded
  2024-02-08 16:12:26.716458+00 |       1 | started
  2024-02-08 16:12:26.762434+00 |       1 | succeeded
  2024-02-08 16:12:26.764671+00 |       2 | started
  2024-02-08 16:12:26.79708+00  |       2 | succeeded
  2024-02-08 16:12:26.799201+00 |       3 | started
  2024-02-08 16:12:29.630453+00 |       3 | succeeded
  2024-02-08 16:12:30.091878+00 |       4 | started
  2024-02-08 16:12:30.094589+00 |       4 | succeeded
  2024-02-08 16:12:30.096826+00 |       5 | started
  2024-02-08 16:12:30.203381+00 |       5 | succeeded
  2024-02-08 16:12:30.20559+00  |       6 | started
  2024-02-08 16:12:30.207753+00 |       6 | succeeded
  2024-02-08 16:12:30.209735+00 |       7 | started
  2024-02-08 16:12:30.211783+00 |       7 | succeeded
  2024-02-08 16:12:30.214181+00 |       8 | started
  2024-02-08 16:12:30.216157+00 |       8 | succeeded
  2024-02-08 16:12:30.218013+00 |       9 | started
  2024-02-08 16:12:30.219947+00 |       9 | succeeded
  2024-02-08 16:12:30.221934+00 |      10 | started
  2024-02-08 16:12:30.22368+00  |      10 | succeeded
  2024-02-08 16:12:30.225822+00 |      11 | started
  2024-02-08 16:12:30.227877+00 |      11 | succeeded
  2024-02-08 16:12:30.23009+00  |      12 | started
  2024-02-08 16:12:30.232017+00 |      12 | succeeded
  2024-02-08 16:12:30.233914+00 |      13 | started
  2024-02-08 16:12:30.235966+00 |      13 | succeeded
  2024-02-08 16:12:30.237981+00 |      14 | started
  2024-02-08 16:12:30.239991+00 |      14 | succeeded
  2024-02-08 16:12:30.241852+00 |      15 | started
  2024-02-08 16:12:30.244071+00 |      15 | succeeded
  2024-02-08 16:12:30.246213+00 |      16 | started
  2024-02-08 16:12:30.248214+00 |      16 | succeeded
  2024-02-08 16:12:30.250175+00 |      17 | started
  2024-02-08 16:12:30.252159+00 |      17 | succeeded
  2024-02-08 16:12:30.254117+00 |      18 | started
  2024-02-08 16:12:30.256072+00 |      18 | succeeded
  2024-02-08 16:12:30.25808+00  |      19 | started
  2024-02-08 16:12:30.26002+00  |      19 | succeeded
  2024-02-08 16:12:30.262301+00 |      20 | started
  2024-02-08 16:12:30.264098+00 |      20 | succeeded
  2024-02-08 16:12:30.265843+00 |      21 | started
  2024-02-08 16:12:30.267972+00 |      21 | succeeded
  2024-02-08 16:12:30.26979+00  |      22 | started
  2024-02-08 16:12:30.271608+00 |      22 | succeeded
  2024-02-08 16:12:30.273571+00 |      23 | started
  2024-02-08 16:12:30.275477+00 |      23 | succeeded
  2024-02-08 16:12:30.277705+00 |      24 | started
  2024-02-08 16:12:30.279513+00 |      24 | succeeded
  2024-02-08 16:12:30.281332+00 |      25 | started
  2024-02-08 16:12:30.283178+00 |      25 | succeeded
  2024-02-08 16:12:30.284962+00 |      26 | started
  2024-02-08 16:12:30.286979+00 |      26 | succeeded
  2024-02-08 16:12:30.28884+00  |      27 | started
  2024-02-08 16:12:30.290806+00 |      27 | succeeded
  2024-02-08 16:12:30.293019+00 |      28 | started
  2024-02-08 16:12:30.294892+00 |      28 | succeeded
  2024-02-08 16:12:30.296806+00 |      29 | started
  2024-02-08 16:12:30.298742+00 |      29 | succeeded
  2024-02-08 16:12:30.300649+00 |      30 | started
  2024-02-08 16:12:30.302622+00 |      30 | succeeded
  2024-02-08 16:12:30.304562+00 |      31 | started
  2024-02-08 16:12:30.306452+00 |      31 | succeeded
  2024-02-08 16:12:30.308577+00 |      32 | started
  2024-02-08 16:12:30.310384+00 |      32 | succeeded
  2024-02-08 16:12:30.312371+00 |      33 | started
  2024-02-08 16:12:30.314228+00 |      33 | succeeded
  2024-02-08 16:12:30.316086+00 |      34 | started
  2024-02-08 16:12:30.318122+00 |      34 | succeeded
  2024-02-08 16:12:30.319904+00 |      35 | started
  2024-02-08 16:12:30.321736+00 |      35 | succeeded
  2024-02-08 16:12:30.323899+00 |      36 | started
  2024-02-08 16:12:30.44017+00  |      36 | succeeded
  2024-02-08 16:12:30.442165+00 |      37 | started
  2024-02-08 16:12:30.444146+00 |      37 | succeeded
  2024-02-08 16:12:30.446352+00 |      38 | started
  2024-02-08 16:12:30.44827+00  |      38 | succeeded
  2024-02-08 16:12:30.45021+00  |      39 | started
  2024-02-08 16:12:40.236154+00 |      39 | succeeded
  2024-02-08 16:12:40.238705+00 |      40 | started
  2024-02-08 16:12:40.240896+00 |      40 | succeeded
  2024-02-08 16:12:40.243156+00 |      41 | started
  2024-02-08 16:12:40.245407+00 |      41 | succeeded
  2024-02-08 16:12:40.247801+00 |      42 | started
  2024-02-08 16:12:40.24974+00  |      42 | succeeded
  2024-02-08 16:12:40.251857+00 |      43 | started
  2024-02-08 16:12:40.253693+00 |      43 | succeeded
  2024-02-08 16:12:40.255688+00 |      44 | started
  2024-02-08 16:12:40.25762+00  |      44 | succeeded
  2024-02-08 16:12:40.259488+00 |      45 | started
  2024-02-08 16:12:40.26173+00  |      45 | succeeded
  2024-02-08 16:12:40.26461+00  |      46 | started
  2024-02-08 16:12:40.266653+00 |      46 | succeeded
  2024-02-08 16:12:40.268581+00 |      47 | started
  2024-02-08 16:12:40.270533+00 |      47 | succeeded
  2024-02-08 16:12:40.272566+00 |      48 | started
  2024-02-08 16:12:40.274824+00 |      48 | succeeded
  2024-02-08 16:12:40.276714+00 |      49 | started
  2024-02-08 16:12:40.278746+00 |      49 | succeeded
  2024-02-08 16:12:40.28106+00  |      50 | started
  2024-02-08 16:12:40.283092+00 |      50 | succeeded
  2024-02-08 16:12:40.284929+00 |      51 | started
  2024-02-08 16:12:40.287087+00 |      51 | succeeded
  2024-02-08 16:12:40.289033+00 |      52 | started
  2024-02-08 16:12:40.29088+00  |      52 | succeeded
  2024-02-08 16:12:40.292814+00 |      53 | started
  2024-02-08 16:12:40.294738+00 |      53 | succeeded
  2024-02-08 16:12:40.297217+00 |      54 | started
  2024-02-08 16:12:40.299128+00 |      54 | succeeded
  2024-02-08 16:12:40.301067+00 |      55 | started
  2024-02-08 16:12:40.302961+00 |      55 | succeeded
  2024-02-08 16:12:40.304831+00 |      56 | started
  2024-02-08 16:12:40.306614+00 |      56 | succeeded
  2024-02-08 16:12:40.308809+00 |      57 | started
  2024-02-08 16:12:40.310952+00 |      57 | succeeded
  2024-02-08 16:12:40.31312+00  |      58 | started
  2024-02-08 16:12:40.315168+00 |      58 | succeeded
  2024-02-08 16:12:40.317021+00 |      59 | started
  2024-02-08 16:12:40.318862+00 |      59 | succeeded
  2024-02-08 16:12:40.32084+00  |      60 | started
  2024-02-08 16:12:40.322798+00 |      60 | succeeded
  2024-02-08 16:12:40.324903+00 |      61 | started
  2024-02-08 16:12:40.326817+00 |      61 | succeeded
  2024-02-08 16:12:40.329095+00 |      62 | started
  2024-02-08 16:12:40.331087+00 |      62 | succeeded
  2024-02-08 16:12:40.333072+00 |      63 | started
  2024-02-08 16:12:40.334957+00 |      63 | succeeded
  2024-02-08 16:12:40.336829+00 |      64 | started
  2024-02-08 16:12:40.338741+00 |      64 | succeeded
  2024-02-08 16:12:40.340541+00 |      65 | started
  2024-02-08 16:12:40.34234+00  |      65 | succeeded
  2024-02-08 16:12:40.3447+00   |      66 | started
  2024-02-08 16:12:40.346562+00 |      66 | succeeded
  2024-02-08 16:12:40.348413+00 |      67 | started
  2024-02-08 16:12:40.350398+00 |      67 | succeeded
  2024-02-08 16:12:40.352359+00 |      68 | started
  2024-02-08 16:12:40.354266+00 |      68 | succeeded
  2024-02-08 16:12:40.35617+00  |      69 | started
  2024-02-08 16:12:40.358193+00 |      69 | succeeded
  2024-02-08 16:12:40.360285+00 |      70 | started
  2024-02-08 16:12:40.362226+00 |      70 | succeeded
  2024-02-08 16:12:40.364301+00 |      71 | started
  2024-02-08 16:12:40.366185+00 |      71 | succeeded
  2024-02-08 16:12:40.368252+00 |      72 | started
  2024-02-08 16:12:40.370282+00 |      72 | succeeded
  2024-02-08 16:12:26.70058+00  |     182 | started
  2024-02-08 16:12:26.709765+00 |     182 | succeeded
(148 rows)

It happens to put node 182 last, and it's not in the first 100:

root@[fd00:1122:3344:108::3]:32221/omicron> select event_time,node_id,event_type from saga_node_event where saga_id = '7dce104f-0806-4906-a8b6-267f18379e21' ORDER BY saga_id LIMIT 100;
           event_time           | node_id | event_type
--------------------------------+---------+-------------
  2024-02-08 16:12:26.712324+00 |       0 | started
  2024-02-08 16:12:26.714392+00 |       0 | succeeded
  2024-02-08 16:12:26.716458+00 |       1 | started
  2024-02-08 16:12:26.762434+00 |       1 | succeeded
  2024-02-08 16:12:26.764671+00 |       2 | started
  2024-02-08 16:12:26.79708+00  |       2 | succeeded
  2024-02-08 16:12:26.799201+00 |       3 | started
  2024-02-08 16:12:29.630453+00 |       3 | succeeded
  2024-02-08 16:12:30.091878+00 |       4 | started
  2024-02-08 16:12:30.094589+00 |       4 | succeeded
  2024-02-08 16:12:30.096826+00 |       5 | started
  2024-02-08 16:12:30.203381+00 |       5 | succeeded
  2024-02-08 16:12:30.20559+00  |       6 | started
  2024-02-08 16:12:30.207753+00 |       6 | succeeded
  2024-02-08 16:12:30.209735+00 |       7 | started
  2024-02-08 16:12:30.211783+00 |       7 | succeeded
  2024-02-08 16:12:30.214181+00 |       8 | started
  2024-02-08 16:12:30.216157+00 |       8 | succeeded
  2024-02-08 16:12:30.218013+00 |       9 | started
  2024-02-08 16:12:30.219947+00 |       9 | succeeded
  2024-02-08 16:12:30.221934+00 |      10 | started
  2024-02-08 16:12:30.22368+00  |      10 | succeeded
  2024-02-08 16:12:30.225822+00 |      11 | started
  2024-02-08 16:12:30.227877+00 |      11 | succeeded
  2024-02-08 16:12:30.23009+00  |      12 | started
  2024-02-08 16:12:30.232017+00 |      12 | succeeded
  2024-02-08 16:12:30.233914+00 |      13 | started
  2024-02-08 16:12:30.235966+00 |      13 | succeeded
  2024-02-08 16:12:30.237981+00 |      14 | started
  2024-02-08 16:12:30.239991+00 |      14 | succeeded
  2024-02-08 16:12:30.241852+00 |      15 | started
  2024-02-08 16:12:30.244071+00 |      15 | succeeded
  2024-02-08 16:12:30.246213+00 |      16 | started
  2024-02-08 16:12:30.248214+00 |      16 | succeeded
  2024-02-08 16:12:30.250175+00 |      17 | started
  2024-02-08 16:12:30.252159+00 |      17 | succeeded
  2024-02-08 16:12:30.254117+00 |      18 | started
  2024-02-08 16:12:30.256072+00 |      18 | succeeded
  2024-02-08 16:12:30.25808+00  |      19 | started
  2024-02-08 16:12:30.26002+00  |      19 | succeeded
  2024-02-08 16:12:30.262301+00 |      20 | started
  2024-02-08 16:12:30.264098+00 |      20 | succeeded
  2024-02-08 16:12:30.265843+00 |      21 | started
  2024-02-08 16:12:30.267972+00 |      21 | succeeded
  2024-02-08 16:12:30.26979+00  |      22 | started
  2024-02-08 16:12:30.271608+00 |      22 | succeeded
  2024-02-08 16:12:30.273571+00 |      23 | started
  2024-02-08 16:12:30.275477+00 |      23 | succeeded
  2024-02-08 16:12:30.277705+00 |      24 | started
  2024-02-08 16:12:30.279513+00 |      24 | succeeded
  2024-02-08 16:12:30.281332+00 |      25 | started
  2024-02-08 16:12:30.283178+00 |      25 | succeeded
  2024-02-08 16:12:30.284962+00 |      26 | started
  2024-02-08 16:12:30.286979+00 |      26 | succeeded
  2024-02-08 16:12:30.28884+00  |      27 | started
  2024-02-08 16:12:30.290806+00 |      27 | succeeded
  2024-02-08 16:12:30.293019+00 |      28 | started
  2024-02-08 16:12:30.294892+00 |      28 | succeeded
  2024-02-08 16:12:30.296806+00 |      29 | started
  2024-02-08 16:12:30.298742+00 |      29 | succeeded
  2024-02-08 16:12:30.300649+00 |      30 | started
  2024-02-08 16:12:30.302622+00 |      30 | succeeded
  2024-02-08 16:12:30.304562+00 |      31 | started
  2024-02-08 16:12:30.306452+00 |      31 | succeeded
  2024-02-08 16:12:30.308577+00 |      32 | started
  2024-02-08 16:12:30.310384+00 |      32 | succeeded
  2024-02-08 16:12:30.312371+00 |      33 | started
  2024-02-08 16:12:30.314228+00 |      33 | succeeded
  2024-02-08 16:12:30.316086+00 |      34 | started
  2024-02-08 16:12:30.318122+00 |      34 | succeeded
  2024-02-08 16:12:30.319904+00 |      35 | started
  2024-02-08 16:12:30.321736+00 |      35 | succeeded
  2024-02-08 16:12:30.323899+00 |      36 | started
  2024-02-08 16:12:30.44017+00  |      36 | succeeded
  2024-02-08 16:12:30.442165+00 |      37 | started
  2024-02-08 16:12:30.444146+00 |      37 | succeeded
  2024-02-08 16:12:30.446352+00 |      38 | started
  2024-02-08 16:12:30.44827+00  |      38 | succeeded
  2024-02-08 16:12:30.45021+00  |      39 | started
  2024-02-08 16:12:40.236154+00 |      39 | succeeded
  2024-02-08 16:12:40.238705+00 |      40 | started
  2024-02-08 16:12:40.240896+00 |      40 | succeeded
  2024-02-08 16:12:40.243156+00 |      41 | started
  2024-02-08 16:12:40.245407+00 |      41 | succeeded
  2024-02-08 16:12:40.247801+00 |      42 | started
  2024-02-08 16:12:40.24974+00  |      42 | succeeded
  2024-02-08 16:12:40.251857+00 |      43 | started
  2024-02-08 16:12:40.253693+00 |      43 | succeeded
  2024-02-08 16:12:40.255688+00 |      44 | started
  2024-02-08 16:12:40.25762+00  |      44 | succeeded
  2024-02-08 16:12:40.259488+00 |      45 | started
  2024-02-08 16:12:40.26173+00  |      45 | succeeded
  2024-02-08 16:12:40.26461+00  |      46 | started
  2024-02-08 16:12:40.266653+00 |      46 | succeeded
  2024-02-08 16:12:40.268581+00 |      47 | started
  2024-02-08 16:12:40.270533+00 |      47 | succeeded
  2024-02-08 16:12:40.272566+00 |      48 | started
  2024-02-08 16:12:40.274824+00 |      48 | succeeded
  2024-02-08 16:12:40.276714+00 |      49 | started
  2024-02-08 16:12:40.278746+00 |      49 | succeeded
(100 rows)

So I believe that explains everything and the fix should be fairly straightforward.


How long has this been broken? The pagination code was rewritten when we introduced Diesel back in late 2021 and it appears this issue was introduced then, though it probably could not be triggered until later, when we first had sagas with more than about 50 nodes (enough nodes to push the node events into the second page). However, the conditions required to hit it are somewhat specific and unpredictable: you need a saga large enough, plus a Nexus crash, plus whatever conditions cause CockroachDB to put the start node at the end of the result set.


In principle, it's conceivable that we could have a much worse failure mode: suppose we started executing a saga, then Nexus restarted, and when it came up, it got an incomplete but valid log. Steno might resume execution but actually wind up rerunning some nodes that had already run before. It's not clear how arbitrarily bad things could get in this case. However, I think this would be impossible because:

smklein commented 4 months ago

If I'm understanding this issue correctly, we're saying that the events should be paginated in "node ID" order, not by "saga ID" order, flagging this line as incorrect:

https://github.com/oxidecomputer/omicron/blob/cb3a9bf2d49df9b048f590d2340b0a8a07819f01/nexus/db-queries/src/db/saga_recovery.rs#L298

I also noticed that this function calls into https://github.com/oxidecomputer/omicron/blob/cb3a9bf2d49df9b048f590d2340b0a8a07819f01/nexus/db-queries/src/db/saga_recovery.rs#L292-L293C23 to actually get the batch.

This function also seems incorrect, as it is using the pagination helper, but is ordering by saga_id, not node_id: https://github.com/oxidecomputer/omicron/blob/cb3a9bf2d49df9b048f590d2340b0a8a07819f01/nexus/db-queries/src/db/datastore/saga.rs#L131-L138

davepacheco commented 4 months ago

Yes, I think that's the line that's wrong, and agreed that saga_node_event_list_by_id() is also sorting on the wrong column. Since the pagination key is supposed to be unique, and there can be multiple events for the same node_id, I think we'll probably want the pagination key to be the tuple of (node_id, event_type).

I'd suggest that we convert this code over to the more modern Paginator. It should be pretty easy to use and it looks like it supports paginating by two columns like this.