oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
244 stars 36 forks source link

test failure: omicron-omdb::test_all_output test_omdb_success_cases #6509

Closed smklein closed 2 weeks ago

smklein commented 2 weeks ago

On branch main commit 5bf5f09545b7b74f809c099db5c0b315ea06f9be, I saw this test failure:

running commands with args: ["nexus", "blueprints", "show", "b1e4b4bd-a186-416b-99ed-63b8d88d7b89"]
running commands with args: ["nexus", "blueprints", "show", "current-target"]
running commands with args: ["nexus", "blueprints", "diff", "b1e4b4bd-a186-416b-99ed-63b8d88d7b89", "current-target"]
@@ -563,11 +563,11 @@
 task: "metrics_producer_gc"
   configured period: every 1m
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
     started at <REDACTED     TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
-warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED           TIMESTAMP>"), "pruned": Array []})
+warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED        TIMESTAMP>"), "pruned": Array []})

 task: "phantom_disks"
   configured period: every <REDACTED_DURATION>s
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
@@ -990,11 +990,11 @@
 task: "metrics_producer_gc"
   configured period: every 1m
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
     started at <REDACTED     TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
-warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED           TIMESTAMP>"), "pruned": Array []})
+warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED        TIMESTAMP>"), "pruned": Array []})

 task: "phantom_disks"
   configured period: every <REDACTED_DURATION>s
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing

test test_omdb_success_cases ... FAILED

failures:

failures:
    test_omdb_success_cases

I was actually trying to reproduce https://github.com/oxidecomputer/omicron/issues/6505 locally, but this looks like a different error. Notably, I'm seeing differences in the whitespace between REDACTED...TIMESTAMP.

smklein commented 2 weeks ago

Other notes: This was on Helios, after running cargo nt -p omicron-omdb -- test_omdb_success_cases in a loop for almost 100 iterations.

davepacheco commented 2 weeks ago

It looks like the code that does the timestamp redactions attempts to preserve the width (in characters) of the timestamp: https://github.com/oxidecomputer/omicron/blob/eb54c1c1c013936e425c1f291af07a67f4c1c845/test-utils/src/dev/test_cmds.rs#L175-L177

It sounds like most of the time, the width matches what's in the expected output file. In this case, it was 3 characters shorter. In the past I've seen that some ways of formatting timestamps implicitly vary the precision based on the value -- e.g., if the milliseconds happens to be exactly 0, then they get left out.

On a regular run I saw this value:

warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("2024-09-03T17:29:27.850928397Z"), "pruned": Array []})

So (1) we're doing the string formatting on the server, and (2) my guess is that the nanoseconds component was 0 in the case we saw and the serialization just left it out.