Open sunshowers opened 6 months ago
Note that this corresponded to some network flakiness that was going on around that time (2024-03-28T23:14:41.638474412Z).
I believe CRDB may have been unavailable during this time?
Looks like it, just above the panic I see
23:14:39.337Z ERRO 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to collect inventory
background_task = service_zone_nat_tracker
error = Service Unavailable: Failed to access DB connection: Timed out in bb8
file = nexus/src/app/background/sync_service_zone_nat.rs:71
...
23:14:41.465Z WARN 65a11c18-7f59-41ac-b9e7-680627f996e7 (ServerContext): failed to read DNS config
background_task = dns_config_internal
current_generation = 1
current_time_created = 2023-08-30 18:59:10.774294 UTC
dns_group = internal
error = Service Unavailable: Failed to access DB connection: Timed out in bb8
file = nexus/src/app/background/dns_config.rs:72
@sunshowers thanks for catching this, a few expects
snuck through. Patching this now.
During today's dogfood mupdate, we found a core dump on gc08 (rsync'd over to
/staff/dock/rack2/mupdate-20240329/cores/sled-08/core.oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7.nexus.5125.1711667683
).Based on timestamps, this corresponds to this message in the log file
/pool/ext/8a199f12-4f5c-483a-8aca-f97856658a35/crypt/debug/oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7/oxide-nexus:default.log.1711677599
:The assertion is here.
cc @internet-diglett who this code annotates to, and @rcgoodfellow for the nearby TODO.