oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

R11 SP update failed with 503 #6898

Open augustuswm opened 2 weeks ago

augustuswm commented 2 weeks ago

When attempting to update the colo rack from R11 rc0 to R11 rc1 we encountered a failure during the first step of the update (sled 14 / switch 0).

The update was initiated with wicket16 rack-update start --sled 14 --switch 0 --psc 0 --force-update-rot --force-update-sp --color always and after the ROT update update succeeded it reported a failure in updating the SP:

[ sled 14 00:00:18]    Failed ( 6/19) Updating SP: after 5.01s: error updating component
[ sled 14 00:00:18]             Caused by:
[ sled 14 00:00:18]             - step failed: Sending data to MGS (slot 0)
[ sled 14 00:00:18]             - SP component update failed at stage "sending" for gimlet_sp:gimlet-c (version 1.0.29)
[ sled 14 00:00:18]             - Error Response: status: 503 Service Unavailable; headers: {"content-type": "application/json", "x-request-id": "10fa7c3f-7239-4dc2-92ea-80b53817adab", "content-length": "247", "date": "Fri, 18 Oct 2024 03:27:51 GMT"}; value: Error { error_code: Some("UpdateFailed"), message: "updating SP SpIdentifier { typ: Sled, slot: 14 } failed: failed to send update message to SP: Error response from SP: update failed (code 7)", request_id: "10fa7c3f-7239-4dc2-92ea-80b53817adab" }

I captured a hubris dump from the SP and gathered logs from the sled 16 switch zone. The dump is available in /staff/rack3/BRM42220013/2024-10-17 and the logs are as follows:

BRM42220078 # grep -E "03:30:[01]" /zone/oxz_switch/root/var/svc/log/oxide-mgs:default.log.0 | grep -v "caboose?" | grep -v "/component" | looker

03:30:00.134Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13462
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = 7841e872-dd49-43f7-8859-ee9d742dbc42
    response_code = 200
    uri = /sp/sled/4
03:30:00.796Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 14633
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:44527
    req_id = c806e9db-2814-44ca-8185-cfb4f698a919
    response_code = 200
    uri = /sp/sled/29
03:30:03.934Z WARN 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): ignoring unknown ignition target 33 returned by ignition controller SP
    file = gateway/src/management_switch.rs:469
03:30:03.934Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 5094
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = fca91f1c-5749-484e-a058-1ff9659305bf
    response_code = 200
    uri = /ignition
03:30:04.778Z WARN 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (producer-server): failed to register as a producer with Nexus, will retry
    delay = 135.75277724s
    error = "Communication Error: error sending request for url (http://[fd00:1122:3344:10e::3]:12221/metrics/producers)"
    file = oximeter/producer/src/lib.rs:424
03:30:05.020Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 64165
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = 19c01a52-5ea9-4056-8e6d-d590d6d877f8
    response_code = 200
    uri = /sp/sled/14
03:30:05.148Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 14075
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:44527
    req_id = 87d3f59e-61e6-4382-8e42-05a46e12a2cb
    response_code = 200
    uri = /sp/switch/0
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 12975
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42080
    req_id = 11368dc0-b4ab-4bb2-966c-7ef348a22ccb
    response_code = 200
    uri = /sp/sled/2
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13160
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37702
    req_id = c23fc01e-2286-4212-838d-79652e29b91f
    response_code = 200
    uri = /sp/sled/12
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13151
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:41103
    req_id = 66026e02-08d5-4647-8894-fc08cd4be927
    response_code = 200
    uri = /sp/power/0
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13214
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:54159
    req_id = 3eee949e-e8f8-4e47-b2d5-d2a1c8c56eca
    response_code = 200
    uri = /sp/sled/30
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13178
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:41266
    req_id = f4cba3ef-a94c-45a1-aae3-f22f5192ff64
    response_code = 200
    uri = /sp/sled/9
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13241
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:38979
    req_id = 9485cf08-22a4-4c71-91ec-c584cfe62018
    response_code = 200
    uri = /sp/sled/20
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13337
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:62833
    req_id = df2f62ca-c264-4a7a-baf3-3e96b673c5a2
    response_code = 200
    uri = /sp/sled/25
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13340
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37802
    req_id = 6e023f59-0ce9-427d-a772-f5f7bd56bdcd
    response_code = 200
    uri = /sp/sled/31
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13415
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:38656
    req_id = 0d14f304-9534-4bfe-9fed-ec890c1a5bbe
    response_code = 200
    uri = /sp/sled/5
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13407
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:60058
    req_id = 9c2495ec-4c5f-40d8-97f6-1025f03ac3be
    response_code = 200
    uri = /sp/sled/11
03:30:05.451Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13408
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37562
    req_id = 45313b3e-72a4-44a5-ac51-d79b02668c5f
    response_code = 200
    uri = /sp/sled/13
03:30:05.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19034
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:61748
    req_id = 99c1803e-234a-4d2b-bd9b-6e1f760daaa3
    response_code = 200
    uri = /sp/sled/10
03:30:05.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19225
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37623
    req_id = a05b2299-5474-4baf-a601-5beaa8a04ddb
    response_code = 200
    uri = /sp/sled/0
03:30:05.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19184
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:57122
    req_id = deae6800-40a7-4b19-aca2-98cf5b1b3a9c
    response_code = 200
    uri = /sp/sled/27
03:30:05.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19234
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:46104
    req_id = 063de735-d84d-4f57-91f4-0c781209051c
    response_code = 200
    uri = /sp/sled/22
03:30:05.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19344
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:45616
    req_id = 9f6c119d-91a4-4176-8f76-f58fdf0f12d8
    response_code = 200
    uri = /sp/sled/28
03:30:05.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19294
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:61300
    req_id = 5b160438-0f5c-4576-a338-52e715189ab1
    response_code = 200
    uri = /sp/sled/8
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19269
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:36219
    req_id = 27871b2e-574e-4db5-b8ff-dcf329bf4dfe
    response_code = 200
    uri = /sp/sled/18
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19555
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:57079
    req_id = 13a635d6-85bd-44f0-a422-f093670e121d
    response_code = 200
    uri = /sp/sled/19
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19397
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:65384
    req_id = 08b6f06b-2129-4a3d-be6a-c6677152d078
    response_code = 200
    uri = /sp/sled/3
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19562
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:56144
    req_id = 651d4bb6-19fb-4f04-9b76-c27b16d2fc10
    response_code = 200
    uri = /sp/sled/23
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19438
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42517
    req_id = e56e1e83-723b-4a57-8993-e1c678833859
    response_code = 200
    uri = /sp/sled/6
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19445
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:47763
    req_id = ab60ba74-6cc1-462d-a9af-106c57a639c2
    response_code = 200
    uri = /sp/sled/26
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19782
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:49764
    req_id = df7dc5c2-b450-468c-9272-d18e68b8cdae
    response_code = 200
    uri = /sp/sled/17
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19565
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:62865
    req_id = d703c9f9-bd7d-4e2f-ae8e-5b839a947b7c
    response_code = 200
    uri = /sp/sled/16
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19916
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:49159
    req_id = ddf324ef-d88c-432c-b6a6-5af0b097cc2e
    response_code = 200
    uri = /sp/sled/7
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19683
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:58703
    req_id = 113ec1bc-06c5-4acd-87a1-4932feef4548
    response_code = 200
    uri = /sp/sled/15
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 19891
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:59181
    req_id = e7a84179-bdee-446b-9440-50e1c799427a
    response_code = 200
    uri = /sp/sled/21
03:30:05.458Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 20010
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:35193
    req_id = 3c909f30-b73a-4903-8a55-fc5674c17c9a
    response_code = 200
    uri = /sp/sled/1
03:30:05.464Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 25786
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42954
    req_id = 71e4d65c-97f5-47c6-a346-ab356cc87070
    response_code = 200
    uri = /sp/switch/1
03:30:05.812Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): initial discovery failed
    addr = [ff02::1de:2%61]:11111
    error = RPC call failed (gave up after 5 attempts)
    file = /home/build/.cargo/git/checkouts/management-gateway-service-749acba834b73294/9bbac47/gateway-sp-comms/src/single_sp.rs:1827
    interface = gimlet24
03:30:08.934Z WARN 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): ignoring unknown ignition target 33 returned by ignition controller SP
    file = gateway/src/management_switch.rs:469
03:30:08.934Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 5015
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = ebd30568-341b-491f-a4a9-9da06d8a81a5
    response_code = 200
    uri = /ignition
03:30:09.050Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): update complete
    file = /home/build/.cargo/git/checkouts/management-gateway-service-749acba834b73294/9bbac47/gateway-sp-comms/src/single_sp/update.rs:481
    id = 580b6bac-7ea4-4a40-be51-27ceeed23d5c
    interface = gimlet14
03:30:09.278Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): cannot use reset watchdog; not supported for rot
    file = /home/build/.cargo/git/checkouts/management-gateway-service-749acba834b73294/9bbac47/gateway-sp-comms/src/single_sp.rs:831
    interface = gimlet14
03:30:10.135Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13742
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = b0cd5407-e0f8-41ba-8517-b481f158e192
    response_code = 200
    uri = /sp/sled/4
03:30:10.804Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 21743
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:44527
    req_id = f66e2617-9f6b-449e-93c0-b72dfc5155ae
    response_code = 200
    uri = /sp/sled/29
03:30:12.425Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): starting SP update
    aux_flash_chck = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    aux_flash_size = 0
    file = /home/build/.cargo/git/checkouts/management-gateway-service-749acba834b73294/9bbac47/gateway-sp-comms/src/single_sp/update.rs:105
    id = 80e9ecfb-6bf0-4dc1-9228-09d91ccb7663
    interface = gimlet14
    sp_image_size = 655616
03:30:13.934Z WARN 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): ignoring unknown ignition target 33 returned by ignition controller SP
    file = gateway/src/management_switch.rs:469
03:30:13.934Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 5081
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = e1940ff2-a6e2-4a92-b3bb-1dbc8dab2278
    response_code = 200
    uri = /ignition
03:30:14.970Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 12814
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:50965
    req_id = 62361d88-ff6f-4def-860f-f8b9b9e44f0b
    response_code = 200
    uri = /sp/sled/14
03:30:15.148Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13169
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:44527
    req_id = 6bb1b243-2cb7-44dd-a0b8-082df4c20c9d
    response_code = 200
    uri = /sp/switch/0
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 12803
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:35193
    req_id = 38704fcb-14c6-47a4-840a-c3f35a759275
    response_code = 200
    uri = /sp/sled/19
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 12803
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:58703
    req_id = 02bd99a6-4ffb-42a8-853a-032e2dc67c47
    response_code = 200
    uri = /sp/sled/26
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 12772
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:59181
    req_id = 2a969ff1-cfc6-4e31-b944-c5d902f21028
    response_code = 200
    uri = /sp/sled/12
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 12964
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:62833
    req_id = 6ac21611-5b1d-4894-b4ef-8d4ba7175d31
    response_code = 200
    uri = /sp/sled/6
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13063
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:61300
    req_id = 541ee2e5-cd86-49be-b59c-b77e43ba30c0
    response_code = 200
    uri = /sp/power/0
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13284
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:47763
    req_id = 2c89f9c5-cb1c-40c2-bd0e-666116c862f6
    response_code = 200
    uri = /sp/sled/21
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13234
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:61748
    req_id = 76d55102-1964-430e-8998-29f2767b75ca
    response_code = 200
    uri = /sp/sled/27
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13333
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:56144
    req_id = 6939597e-5f8a-4486-b3ec-bc45dc799423
    response_code = 200
    uri = /sp/sled/30
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13486
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:54159
    req_id = f0b75a92-8ceb-4a3e-bee3-bd01a6f7067d
    response_code = 200
    uri = /sp/sled/31
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13201
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:61505
    req_id = a1c1202a-0505-4703-b72e-8c9a9b830fb9
    response_code = 200
    uri = /sp/sled/0
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13387
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:60058
    req_id = 74f8f315-63c5-4bdc-b755-084255c21f33
    response_code = 200
    uri = /sp/sled/11
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13386
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:41266
    req_id = 40f22995-1a6e-41f2-93d7-61e33b4e293e
    response_code = 200
    uri = /sp/sled/18
03:30:15.452Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13415
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42080
    req_id = 9e537afb-e056-4de4-8de9-be5caa288fb9
    response_code = 200
    uri = /sp/sled/22
03:30:15.453Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13579
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37623
    req_id = 53de969e-d139-4f59-ae53-20bdcf9c79b6
    response_code = 200
    uri = /sp/switch/1
03:30:15.453Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13795
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:49764
    req_id = c53917f4-f841-458d-a4ce-ae9fce69f123
    response_code = 200
    uri = /sp/sled/16
03:30:15.453Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 13854
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:38656
    req_id = fd23629a-2e8d-402d-8d14-64479f5bcb53
    response_code = 200
    uri = /sp/sled/23
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17737
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:57122
    req_id = 4dc97bfe-1c48-4495-b455-f0ef83fb82fd
    response_code = 200
    uri = /sp/sled/2
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17717
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37562
    req_id = 28663036-9616-453a-90d6-03992d9a9c9e
    response_code = 200
    uri = /sp/sled/7
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17764
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:36219
    req_id = 774fabde-8df6-4556-bf89-c27fc506715c
    response_code = 200
    uri = /sp/sled/1
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17793
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42517
    req_id = 595a2151-6db4-4bf5-a40d-4139d96eb908
    response_code = 200
    uri = /sp/sled/15
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17739
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37802
    req_id = 9ba1b6a9-3a05-4586-ad20-94625b3240a2
    response_code = 200
    uri = /sp/sled/28
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17749
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:37702
    req_id = e7141d56-3b80-4b90-88d4-bac7499092ce
    response_code = 200
    uri = /sp/sled/10
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17760
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:65384
    req_id = e4d9dd17-dfc2-43e8-81bc-d5de5498e5c1
    response_code = 200
    uri = /sp/sled/9
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17797
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:49159
    req_id = 2db3cc58-456d-4d23-bf8f-773ac615c2d3
    response_code = 200
    uri = /sp/sled/25
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17683
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:45616
    req_id = 1bb66b59-a020-4e72-8a04-562076d6d201
    response_code = 200
    uri = /sp/sled/20
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17856
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:38979
    req_id = fddb248a-281c-41b2-a829-0f15a0ea7a5a
    response_code = 200
    uri = /sp/sled/13
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17761
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:41103
    req_id = e5af8711-bf75-406b-a621-d87714d7ba3c
    response_code = 200
    uri = /sp/sled/3
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17818
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:57079
    req_id = a67fd30c-e7ae-45f6-bc96-6ecf3717d111
    response_code = 200
    uri = /sp/sled/5
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17808
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:46104
    req_id = 5566f479-bb02-434d-a4ec-8175e4f6a0e9
    response_code = 200
    uri = /sp/sled/8
03:30:15.457Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 17766
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:62865
    req_id = 1cb9df8a-b139-42c3-8046-ad5b39cc848d
    response_code = 200
    uri = /sp/sled/17
03:30:16.818Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): initial discovery failed
    addr = [ff02::1de:2%61]:11111
    error = RPC call failed (gave up after 5 attempts)
    file = /home/build/.cargo/git/checkouts/management-gateway-service-749acba834b73294/9bbac47/gateway-sp-comms/src/single_sp.rs:1827
    interface = gimlet24
03:30:16.818Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    error_message_external = error communicating with SP SpIdentifier { typ: Sled, slot: 24 }: no SP discovered
    error_message_internal = error communicating with SP SpIdentifier { typ: Sled, slot: 24 }: no SP discovered
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:938
    latency_us = 4851561
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42954
    req_id = b208f07a-ada2-4887-9e51-06697e65c71b
    response_code = 503
    uri = /sp/sled/24
03:30:18.214Z WARN 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): ignoring unexpected RPC response
    file = /home/build/.cargo/git/checkouts/management-gateway-service-749acba834b73294/9bbac47/gateway-sp-comms/src/single_sp.rs:1955
    interface = gimlet14
    message = Message { header: Header { version: 15, message_id: 46584178 }, kind: SpResponse(ComponentDetails(TlvPage { offset: 0, total: 1 })) }
03:30:18.934Z WARN 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (ManagementSwitch): ignoring unknown ignition target 33 returned by ignition controller SP
    file = gateway/src/management_switch.rs:469
03:30:18.934Z INFO 91fedde5-0cc5-41f2-8c75-22447b4c8cfa (dropshot): request completed
    file = /home/build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.12.0/src/server.rs:950
    latency_us = 5064
    local_addr = [::1]:12225
    method = GET
    remote_addr = [::1]:42954
    req_id = a7c0111a-a54f-4c8c-930a-bb9829a50ff2
    response_code = 200
    uri = /ignition

Wicket reports the following for the sled SP:

SP/0 *                    1.0.29                    1.0.29                   FAILED
SP/1                      UNKNOWN                   1.0.29                   FAILED

• Active Slot:
    • Hubris Archive ID: d1b46b543718ea64
    • Git Commit: b2b108fc2f330675269bec120536f3b97ee4e3be
    • Board: gimlet-c
    • Name: gimlet-c
• Inactive Slot:
    • No information

Update attempts after both clearing the error and resetting via ignition resulted in the same failure.

augustuswm commented 2 weeks ago

I tried updating Sled 13 and ran in to the same error.

augustuswm commented 2 weeks ago

Edit: wrong issue

augustuswm commented 2 weeks ago

Hubris dumps (and mgs logs) from before and after the SP update attempt on Sled 13 are available at /staff/rack3/BRM42220064/2024-10-18

labbott commented 1 week ago

This should be fixed by https://github.com/oxidecomputer/hubris/pull/1905 . It applies to all SPs.

Ignition reset does not seem to reliably clear the error. Ignition off, waiting until it is confirmed off and then ignition on worked reliably to update dogfood (which also reproduced the error). This should go in the release notes for R12.

I will also update with the associated omicron commit/TUF repo which should have the fix.

labbott commented 1 week ago

https://github.com/oxidecomputer/omicron/pull/6940 omicron fix