oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

noisy background tasks? #6319

Closed davepacheco closed 2 months ago

davepacheco commented 2 months ago

I had a Nexus experiencing problems so I went looking for warning-level messages and I got quite a lot of these:

root@oxz_nexus_7589c1f3:~# looker -lwarn < $(svcs -L nexus)
21:15:23.003Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): phantom disk task started
    background_task = phantom_disks
    file = nexus/src/app/background/tasks/phantom_disks.rs:46
21:15:23.004Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): region replacement task started
    background_task = region_replacement
    file = nexus/src/app/background/tasks/region_replacement.rs:64
21:15:23.008Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): phantom disk task done
    background_task = phantom_disks
    file = nexus/src/app/background/tasks/phantom_disks.rs:93
21:15:23.021Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): physical disk adoption task started
    background_task = physical_disk_adoption
    file = nexus/src/app/background/tasks/physical_disk_adoption.rs:100
21:15:23.023Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): region replacement task done
    background_task = region_replacement
    file = nexus/src/app/background/tasks/region_replacement.rs:185
21:15:23.539Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): physical disk adoption task done
    background_task = physical_disk_adoption
    file = nexus/src/app/background/tasks/physical_disk_adoption.rs:174
21:15:52.944Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): Blueprint execution: skipped
    background_task = blueprint_executor
    file = nexus/src/app/background/tasks/blueprint_execution.rs:79
    reason = blueprint disabled
    target_id = 970ff234-8f34-4ef8-944d-b8e094a6be64
21:15:53.001Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): phantom disk task started
    background_task = phantom_disks
    file = nexus/src/app/background/tasks/phantom_disks.rs:46
21:15:53.002Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): region replacement task started
    background_task = region_replacement
    file = nexus/src/app/background/tasks/region_replacement.rs:64
21:15:53.008Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): phantom disk task done
    background_task = phantom_disks
    file = nexus/src/app/background/tasks/phantom_disks.rs:93
21:15:53.023Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): physical disk adoption task started
    background_task = physical_disk_adoption
    file = nexus/src/app/background/tasks/physical_disk_adoption.rs:100
21:15:53.477Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): region replacement task done
    background_task = region_replacement
    file = nexus/src/app/background/tasks/region_replacement.rs:185
...

From the message text, it doesn't sound like these are warning-level. They also seem redundant with log messages that get emitted by the background task subsystem around activation:

root@oxz_nexus_7589c1f3:~# grep -C3 'physical disk adoption task done' $(svcs -L nexus)
...
22:26:23.235Z WARN 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): physical disk adoption task done
    background_task = physical_disk_adoption
    file = nexus/src/app/background/tasks/physical_disk_adoption.rs:174
22:26:23.235Z DEBG 7589c1f3-2f06-44d2-b561-7c21b4da7f7e (ServerContext): activation complete
    background_task = physical_disk_adoption
    elapsed = 524.40255ms
    iteration = 162
    status = {"physical_disks_added":0}

These are all the warning-level messages I saw:

# grep '"level":40' $(svcs -L nexus)

(copy output)

$ pbpaste | json -ga msg | sort | uniq -c | sort -n
   2 failed to read target blueprint
  74 Blueprint execution: skipped
 147 phantom disk task done
 147 region replacement task done
 148 phantom disk task started
 148 region replacement task started
 154 physical disk adoption task done
 154 physical disk adoption task started
hawkw commented 2 months ago

I was also quite surprised by these recently. "Warning" doesn't feel like the right level for a successful periodic activation of a background task that doesn't appear to have encountered anything out of the ordinary...

davepacheco commented 2 months ago

Just to confirm, the background task subsystem also logs the start of each of these tasks:

02:48:16.522Z DEBG 53947004-0a4b-4b23-82d9-5dee49217aef (ServerContext): activating
    background_task = region_replacement_driver
    iteration = 211
    reason = Timeout
02:48:16.522Z INFO 53947004-0a4b-4b23-82d9-5dee49217aef (ServerContext): region replacement driver task started
    background_task = region_replacement_driver
    file = nexus/src/app/background/tasks/region_replacement_driver.rs:231