oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
240 stars 36 forks source link

Unwrap in sled agent if an RSS plan already exists #2402

Open bnaecker opened 1 year ago

bnaecker commented 1 year ago

@leftwo and I are getting Omicron running on the dogfood rack. We're currently on BRM42220070. I made a quick update to Omicron using this patch. We can't run omicron-package uninstall, since that currently destroys the cxgbe{0,1} links and IP addresses that we currently need to log in to the machine. We instead rebuilt, and then ran OMICRON_NO_UNINSTALL=1 omicron-package install. That unpacks the tarballs into the correct place, and then calls svccfg import with the sled-agent manifest. We see this in the logfile:

[ Feb 21 18:04:31 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
[ Feb 21 18:04:31 Method "start" exited with status 0. ]
[ Feb 21 18:04:31 Rereading configuration. ]
[ Feb 21 18:04:31 No 'refresh' method defined.  Treating as :true. ]

That's because svccfg import only updates any changed SMF properties and does not actually do anything like svcadm restart. So Alan and I ran svcadm restart manually. At this point, the sled agent started up. One of the first things it currently does is destroy any existing Oxide zones, VNICs and IP addresses. (Changing this to be more idempotent is tracked in #724.) It then recreates those objects. We then see this in the logs:

[ Feb 21 20:43:37 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/sled-agent/sled-agent run /opt/oxide/sled-agent/pkg/config.toml &"). ]
[ Feb 21 20:43:37 Method "start" exited with status 0. ]
note: configured to log to "/dev/stdout"
{"msg":"Starting mg-ddm service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.71446934Z","hostname":"BRM42220070","pid":5581}
{"msg":"Importing mg-ddm service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.714666334Z","hostname":"BRM42220070","pid":5581,"path":"/opt/oxide/mg-ddm/pkg/ddm/manifest.xml"}
{"msg":"Setting mg-ddm interfaces","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.858431628Z","hostname":"BRM42220070","pid":5581,"interfaces":"(\"cxgbe0/ll\" \"cxgbe1/ll\")"}
{"msg":"Enabling mg-ddm service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.87371669Z","hostname":"BRM42220070","pid":5581}
{"msg":"detecting (real or simulated) SP","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.887126627Z","hostname":"BRM42220070","pid":5581}
{"msg":"setting up bootstrap agent server","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.887178889Z","hostname":"BRM42220070","pid":5581}
{"msg":"Ensuring config directory exists","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.910916301Z","hostname":"BRM42220070","pid":5581,"path":"/var/oxide"}
{"msg":"Sending prefix to ddmd for advertisement","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:37.972234806Z","hostname":"BRM42220070","pid":5581,"prefix":"Ipv6Prefix { addr: fdb0:a840:2504:154::, len: 64 }"}
{"msg":"Deleting existing zone","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:38.022867976Z","hostname":"BRM42220070","pid":5581,"zone_name":"oxz_nexus"}
{"msg":"Deleting existing zone","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:38.03022564Z","hostname":"BRM42220070","pid":5581,"zone_name":"oxz_oximeter"}
{"msg":"Deleting existing zone","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:38.053297124Z","hostname":"BRM42220070","pid":5581,"zone_name":"oxz_internal_dns"}
{"msg":"Deleting existing zone","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:38.061278113Z","hostname":"BRM42220070","pid":5581,"zone_name":"oxz_crucible_pantry"}
{"msg":"halt_and_remove_logged: Previous zone state: Running","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:40.395632247Z","hostname":"BRM42220070","pid":5581}
{"msg":"halt_and_remove_logged: Previous zone state: Running","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:40.421049899Z","hostname":"BRM42220070","pid":5581}
{"msg":"halt_and_remove_logged: Previous zone state: Running","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:40.461762884Z","hostname":"BRM42220070","pid":5581}
{"msg":"halt_and_remove_logged: Previous zone state: Running","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:40.483034618Z","hostname":"BRM42220070","pid":5581}
{"msg":"Deleting existing VNIC","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:40.50254981Z","hostname":"BRM42220070","pid":5581,"vnic_kind":"OxideControlVnic","vnic_name":"oxControlPublic0"}
{"msg":"Deleting existing VNIC","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:40.509786354Z","hostname":"BRM42220070","pid":5581,"vnic_kind":"OxideControlVnic","vnic_name":"oxControlService0"}
{"msg":"Deleting existing VNIC","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:40.509865746Z","hostname":"BRM42220070","pid":5581,"vnic_kind":"OxideControlVnic","vnic_name":"oxControlService1"}
{"msg":"Deleting existing VNIC","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:40.509884546Z","hostname":"BRM42220070","pid":5581,"vnic_kind":"OxideControlVnic","vnic_name":"oxControlService2"}
{"msg":"Deleting existing VNIC","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:40.509899616Z","hostname":"BRM42220070","pid":5581,"vnic_kind":"OxideControlVnic","vnic_name":"oxControlService3"}
{"msg":"Bootstrap Agent monitoring for hardware","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:40.543623717Z","hostname":"BRM42220070","pid":5581}
{"msg":"Creating HardwareManager","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:40.543672318Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@1,3/pci1344,3100@0/blkdev@w00A0750132753688,0\", dev_path: None }, slot: 17, variant: M2, identity: DiskIdentity { vendor: \"1344\", serial: \"214132753688\", model: \"Micron_7300_MTFDHBG1T9TDF\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.389984489Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,3/pci1b96,0@0/blkdev@w0014EE81000D2E4D,0\", dev_path: Some(\"/dev/dsk/c11t0014EE81000D2E4Dd0\") }, slot: 2, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A633\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.39002444Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@3,3/pci1b96,0@0/blkdev@w0014EE81000D2EEA,0\", dev_path: Some(\"/dev/dsk/c3t0014EE81000D2EEAd0\") }, slot: 5, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6C4\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.39003365Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@1,2/pci1b96,0@0/blkdev@w0014EE81000D2FFA,0\", dev_path: Some(\"/dev/dsk/c6t0014EE81000D2FFAd0\") }, slot: 8, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6F4\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390043161Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@1,1/pci1b96,0@0/blkdev@w0014EE81000D2EE9,0\", dev_path: Some(\"/dev/dsk/c5t0014EE81000D2EE9d0\") }, slot: 9, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6BF\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390055441Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@3,2/pci1b96,0@0/blkdev@w0014EE81000D2E9D,0\", dev_path: Some(\"/dev/dsk/c2t0014EE81000D2E9Dd0\") }, slot: 6, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6D3\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390089592Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,4/pci1b96,0@0/blkdev@w0014EE81000D2E4A,0\", dev_path: Some(\"/dev/dsk/c12t0014EE81000D2E4Ad0\") }, slot: 3, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A630\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390098322Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@3,4/pci1b96,0@0/blkdev@w0014EE81000D3035,0\", dev_path: Some(\"/dev/dsk/c4t0014EE81000D3035d0\") }, slot: 4, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A5CB\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390108312Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,2/pci1b96,0@0/blkdev@w0014EE81000D2E50,0\", dev_path: Some(\"/dev/dsk/c10t0014EE81000D2E50d0\") }, slot: 1, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A636\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390116552Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@1,3/pci1b96,0@0/blkdev@w0014EE81000D2F5B,0\", dev_path: Some(\"/dev/dsk/c7t0014EE81000D2F5Bd0\") }, slot: 7, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A771\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390125182Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,1/pci1b96,0@0/blkdev@w0014EE81000D307F,0\", dev_path: Some(\"/dev/dsk/c9t0014EE81000D307Fd0\") }, slot: 0, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A5CD\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390132393Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@3,3/pci1344,3100@0/blkdev@w00A0750132753657,0\", dev_path: None }, slot: 18, variant: M2, identity: DiskIdentity { vendor: \"1344\", serial: \"214132753657\", model: \"Micron_7300_MTFDHBG1T9TDF\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.390139583Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Sled already configured, loading sled agent","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.395595777Z","hostname":"BRM42220070","pid":5581,"component":"BootstrapAgent"}
{"msg":"Monitoring for hardware updates","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.395672679Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager","component":"BootstrapAgent"}
{"msg":"Performing full hardware scan","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.395691339Z","hostname":"BRM42220070","pid":5581,"component":"BootstrapAgent"}
{"msg":"Disabling switch zone (already complete)","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.39569994Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Loading Sled Agent: SledAgentRequest { id: 9a721086-36f5-4a39-af55-1c389fe5eaae, rack_id: 92bbdfed-f133-4489-a918-5e7f281b8760, gateway: Gateway { address: None, mac: MacAddr6([0, 13, 185, 84, 254, 228]) }, subnet: Ipv6Subnet { net: Ipv6Net(Ipv6Network { addr: fd00:1122:3344:101::, prefix: 64 }) } }","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.395849953Z","hostname":"BRM42220070","pid":5581,"component":"BootstrapAgent"}
{"msg":"setting up sled agent server","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.395943575Z","hostname":"BRM42220070","pid":5581}
{"msg":"created sled agent","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.396205362Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"xde driver configuration file appears to already use external IP workaround","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.468164956Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent","conf_file":"\"/kernel/drv/xde.conf\""}
{"msg":"using '[AddrObject { interface: \"cxgbe0\", name: \"ll\" }, AddrObject { interface: \"cxgbe1\", name: \"ll\" }]' as data links for xde driver","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.704353522Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"Sled Agent upserting zpool to Storage Manager: oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.86118484Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"Inserting zpool: ZpoolName(d462a7f7-b628-40fe-80ff-4e4189e2d62b)","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.861257091Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
{"msg":"Sled Agent upserting zpool to Storage Manager: oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.861272012Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"Inserting zpool: ZpoolName(e4b4dc87-ab46-49fb-a4b4-d361ae214c03)","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.861286562Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
{"msg":"Sled Agent upserting zpool to Storage Manager: oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.861319323Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"Inserting zpool: ZpoolName(f4b4dc87-ab46-49fb-a4b4-d361ae214c03)","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.861331233Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
{"msg":"StorageWorker encountered unexpected error: Failed to get info for zpool 'oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b': Zpool execution error: Command [list -Hpo name,size,allocated,free,health oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b] executed and failed with status: exit status: 1  stdout:   stderr: cannot open 'oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b': no such pool\n","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:41.878410055Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
{"msg":"Creating HardwareManager","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:41.893014421Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"StorageWorker encountered unexpected error: Failed to get info for zpool 'oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03': Zpool execution error: Command [list -Hpo name,size,allocated,free,health oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03] executed and failed with status: exit status: 1  stdout:   stderr: cannot open 'oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03': no such pool\n","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:41.894164879Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
{"msg":"StorageWorker encountered unexpected error: Failed to get info for zpool 'oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03': Zpool execution error: Command [list -Hpo name,size,allocated,free,health oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03] executed and failed with status: exit status: 1  stdout:   stderr: cannot open 'oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03': no such pool\n","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:43:41.933579192Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,4/pci1b96,0@0/blkdev@w0014EE81000D2E4A,0\", dev_path: Some(\"/dev/dsk/c12t0014EE81000D2E4Ad0\") }, slot: 3, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A630\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.25421274Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,2/pci1b96,0@0/blkdev@w0014EE81000D2E50,0\", dev_path: Some(\"/dev/dsk/c10t0014EE81000D2E50d0\") }, slot: 1, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A636\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254267031Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@3,2/pci1b96,0@0/blkdev@w0014EE81000D2E9D,0\", dev_path: Some(\"/dev/dsk/c2t0014EE81000D2E9Dd0\") }, slot: 6, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6D3\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254283561Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@1,2/pci1b96,0@0/blkdev@w0014EE81000D2FFA,0\", dev_path: Some(\"/dev/dsk/c6t0014EE81000D2FFAd0\") }, slot: 8, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6F4\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254299292Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@3,3/pci1b96,0@0/blkdev@w0014EE81000D2EEA,0\", dev_path: Some(\"/dev/dsk/c3t0014EE81000D2EEAd0\") }, slot: 5, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6C4\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254346743Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@1,1/pci1b96,0@0/blkdev@w0014EE81000D2EE9,0\", dev_path: Some(\"/dev/dsk/c5t0014EE81000D2EE9d0\") }, slot: 9, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A6BF\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254406654Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,1/pci1b96,0@0/blkdev@w0014EE81000D307F,0\", dev_path: Some(\"/dev/dsk/c9t0014EE81000D307Fd0\") }, slot: 0, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A5CD\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254424135Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@3,4/pci1b96,0@0/blkdev@w0014EE81000D3035,0\", dev_path: Some(\"/dev/dsk/c4t0014EE81000D3035d0\") }, slot: 4, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A5CB\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254435935Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@0,0/pci1022,1483@1,3/pci1344,3100@0/blkdev@w00A0750132753688,0\", dev_path: None }, slot: 17, variant: M2, identity: DiskIdentity { vendor: \"1344\", serial: \"214132753688\", model: \"Micron_7300_MTFDHBG1T9TDF\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254456935Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@ab,0/pci1022,1483@1,3/pci1b96,0@0/blkdev@w0014EE81000D2E4D,0\", dev_path: Some(\"/dev/dsk/c11t0014EE81000D2E4Dd0\") }, slot: 2, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A633\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254474006Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@3,3/pci1344,3100@0/blkdev@w00A0750132753657,0\", dev_path: None }, slot: 18, variant: M2, identity: DiskIdentity { vendor: \"1344\", serial: \"214132753657\", model: \"Micron_7300_MTFDHBG1T9TDF\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254485436Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Update from polling device tree: DiskAdded(UnparsedDisk { paths: DiskPaths { devfs_path: \"/devices/pci@38,0/pci1022,1483@1,3/pci1b96,0@0/blkdev@w0014EE81000D2F5B,0\", dev_path: Some(\"/dev/dsk/c7t0014EE81000D2F5Bd0\") }, slot: 7, variant: U2, identity: DiskIdentity { vendor: \"1b96\", serial: \"A084A771\", model: \"WUS4C6432DSP3X3\" } })","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254496796Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"Sled services found at /var/oxide/service.toml; loading","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254507587Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring service zone is initialized: Nexus","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254759493Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Service zone nexus does not yet exist","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.254780753Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Configuring new Omicron zone: oxz_nexus","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.319566208Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
ZZZ install_ with ["/opt/oxide/nexus.tar.gz"]
{"msg":"Installing Omicron zone: oxz_nexus","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:43.344921599Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Zone booting","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:45.729978643Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_nexus","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:101::3 exists","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:48.803528129Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:101::3, prefix: 64 }))","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:48.803603261Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_nexus","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:101::3 exists - OK","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:49.394285187Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"GZ addresses: []","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:49.394340609Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Setting up Nexus service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:49.972534764Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Adding address: Static(V4(Ipv4Network { addr: 192.168.1.20, prefix: 24 }))","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:49.972599276Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_nexus","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring service zone is initialized: Oximeter","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:50.738549412Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Service zone oximeter does not yet exist","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:50.738592923Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Configuring new Omicron zone: oxz_oximeter","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:50.762946735Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
ZZZ install_ with ["/opt/oxide/oximeter.tar.gz"]
{"msg":"Installing Omicron zone: oxz_oximeter","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:50.791638794Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Zone booting","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:52.092392896Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_oximeter","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:101::4 exists","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:55.006030213Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:101::4, prefix: 64 }))","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:55.006136496Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_oximeter","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:101::4 exists - OK","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:55.642838207Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"GZ addresses: []","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:55.64289669Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Setting up oximeter service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:55.93367823Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring service zone is initialized: InternalDNS","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:56.085815117Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Service zone internal_dns does not yet exist","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:56.085858328Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Configuring new Omicron zone: oxz_internal_dns","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:56.212388006Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
ZZZ install_ with ["/opt/oxide/internal_dns.tar.gz"]
{"msg":"Installing Omicron zone: oxz_internal_dns","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:56.242181795Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Zone booting","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:57.635230132Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_internal_dns","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:1::1 exists","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:59.541230858Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:1::1, prefix: 64 }))","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:43:59.541371491Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_internal_dns","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:1::1 exists - OK","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.228835358Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"GZ addresses: [\n    fd00:1122:3344:1::2,\n]","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.228916899Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring GZ address fd00:1122:3344:1::2 exists","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.22894016Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Sending prefix to ddmd for advertisement","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.259054233Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent","prefix":"Ipv6Prefix { addr: fd00:1122:3344:1::, len: 64 }"}
G{"msg":"Setting up internal-dns service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.570799179Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring service zone is initialized: CruciblePantry","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.72532663Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Service zone crucible_pantry does not yet exist","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.725377861Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Configuring new Omicron zone: oxz_crucible_pantry","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.752504004Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
ZZZ install_ with ["/opt/oxide/crucible_pantry.tar.gz"]
{"msg":"Installing Omicron zone: oxz_crucible_pantry","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:00.777960127Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Zone booting","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:02.089271576Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_crucible_pantry","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:101::10 exists","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:04.195027841Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Adding address: Static(V6(Ipv6Network { addr: fd00:1122:3344:101::10, prefix: 64 }))","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:04.195101303Z","hostname":"BRM42220070","pid":5581,"zone":"oxz_crucible_pantry","component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Ensuring address fd00:1122:3344:101::10 exists - OK","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:04.803763042Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"GZ addresses: []","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:04.803852195Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Setting up Crucible pantry service","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.286935862Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"Monitoring for hardware updates","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.406143844Z","hostname":"BRM42220070","pid":5581,"component":"HardwareManager"}
{"msg":"contacting server nexus, registering sled","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.406210105Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent","baseboard":"Baseboard { identifier: \"BRM42220070\", model: \"913-0000019\", revision: 6 }","id":"9a721086-36f5-4a39-af55-1c389fe5eaae"}
{"msg":"Performing full hardware scan","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.406245706Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"Disabling switch zone (already complete)","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.40631664Z","hostname":"BRM42220070","pid":5581,"component":"ServiceManager","component":"BootstrapAgent"}
{"msg":"listening","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.406724367Z","hostname":"BRM42220070","pid":5581,"local_addr":"[fd00:1122:3344:101::1]:12345","component":"dropshot (SledAgent)"}
{"msg":"Sled Agent loaded; recording configuration","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.406784169Z","hostname":"BRM42220070","pid":5581,"component":"BootstrapAgent"}
{"msg":"Started listening","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.407146377Z","hostname":"BRM42220070","pid":5581,"component":"BootstrapAgentServer","local_addr":"[fdb0:a840:2504:154::1]:12346"}
{"msg":"Sending prefix to ddmd for advertisement","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.407166218Z","hostname":"BRM42220070","pid":5581,"prefix":"Ipv6Prefix { addr: fd00:1122:3344:101::, len: 64 }"}
{"msg":"bootstrap service initializing RSS","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.407179608Z","hostname":"BRM42220070","pid":5581,"component":"BootstrapAgent"}
{"msg":"Injecting RSS configuration: SetupServiceConfig {\n    rack_subnet: fd00:1122:3344:100::,\n    rack_secret_threshold: 1,\n    gateway: Gateway {\n        address: None,\n        mac: MacAddr6(\n            [\n                0,\n                13,\n                185,\n                84,\n                254,\n                228,\n            ],\n        ),\n    },\n    nexus_external_address: 192.168.1.20,\n}","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.40724846Z","hostname":"BRM42220070","pid":5581,"component":"RSS"}
{"msg":"RSS configuration looks like it has already been applied","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.40726625Z","hostname":"BRM42220070","pid":5581,"component":"RSS"}
{"msg":"RSS plan already created, loading from file","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.407298991Z","hostname":"BRM42220070","pid":5581,"component":"RSS"}
{"msg":"RSS plan already created, loading from file","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.407583908Z","hostname":"BRM42220070","pid":5581,"component":"RSS"}
{"msg":"Handing off control to Nexus","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.408045759Z","hostname":"BRM42220070","pid":5581,"component":"RSS"}
{"msg":"failed to notify nexus about sled agent: no record found for Query { name: Name(\"_nexus._tcp.control-plane.oxide.internal.\"), query_type: AAAA, query_class: IN }, will retry in 66.839276ms","v":0,"name":"SledAgent","level":40,"time":"2023-02-21T20:44:05.408313865Z","hostname":"BRM42220070","pid":5581,"sled_id":"9a721086-36f5-4a39-af55-1c389fe5eaae","component":"SledAgent"}
{"msg":"Disk at /devices/pci@0,0/pci1022,1483@3,3/pci1b96,0@0/blkdev@w0014EE81000D2EEA,0 already has a GPT","v":0,"name":"SledAgent","level":30,"time":"2023-02-21T20:44:05.40895173Z","hostname":"BRM42220070","pid":5581,"component":"StorageManager"}
thread 'tokio-runtime-worker' panicked at 'Failed to lookup IP: Resolve(ResolveError { kind: NoRecordsFound { query: Query { name: Name("_nexus._tcp.control-plane.oxide.internal."), query_type: AAAA, query_class: IN }, soa: None, negative_ttl: None, response_code: NXDomain, trusted: true } })', sled-agent/src/rack_setup/service.rs:451:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Feb 21 20:44:05 Stopping because all processes in service exited. ]
[ Feb 21 20:44:05 Executing stop method (:kill). ]
[ Feb 21 20:44:05 Restarting too quickly, changing state to maintenance. ]

You can see the sled agent load a service manifest from /var/oxide/service.toml. It then goes on to start Nexus, Oximeter, and the internal DNS service in zones. Towards the end, we then see some messages about loading an RSS plan from disk, and handing off to Nexus. We hit an unwrap at sled-agent/src/rack_setup/service.rs:451, where we're apparently looking up Nexus's IP by service name, and panicking if that fails.

I'm not sure what should happen here, or if Alan and I have put us in some unexpected situation by unpacking / uninstalling / reinstalling manually.

smklein commented 1 year ago

I can dig into this in a bit, but as a very short-term workaround: If you delete /var/oxide, you'll remove the sled agent's notion of:

bnaecker commented 1 year ago

Yeah, we're going to reboot to get a fresh state. But we can definitely delete those files if we hit this again. Thanks @smklein