Closed citrus-it closed 1 year ago
During dogfood mupdate today, the final nexus handoff failed:
10:29:10.097Z INFO SledAgent (RSS): Failed to handoff to nexus: Error Response: status: 400 Bad Request; headers: {"content-type": "application/json", "x-request-id": "035ad549-d9d7-4a0f-9aa4-e047b17bb160", "content-length": "128", "date": "Fri, 14 Jul 2023 10:29:10 GMT"}; value: Error { error_code: Some("InvalidRequest"), message: "address unavailable", request_id: "035ad549-d9d7-4a0f-9aa4-e047b17bb160" }
and the nexus logs contained:
10:41:09.003Z INFO 36ed3232-caf0-442b-990e-01d0c0506f1a (dropshot_internal): Early exit: Rack already initialized resource = Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ById(001de000-074c-4000-8000-000000000000) } resource = Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ById(001de000-074c-4000-8000-000000000000) } resource = Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ById(001de000-074c-4000-8000-000000000000) } resource = VpcSubnet { parent: Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ByName("oxide-services") }, key: 001de000-c470-4000-8000-000000000001, lookup_type: ByName("external-dns") } resource = VpcSubnet { parent: Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ByName("oxide-services") }, key: 001de000-c470-4000-8000-000000000001, lookup_type: ByName("external-dns") } resource = VpcSubnet { parent: Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ByName("oxide-services") }, key: 001de000-c470-4000-8000-000000000002, lookup_type: ByName("nexus") } resource = VpcSubnet { parent: Vpc { parent: Project { parent: Silo { parent: Fleet, key: 001de000-5110-4000-8000-000000000001, lookup_type: ById(001de000-5110-4000-8000-000000000001) }, key: 001de000-4401-4000-8000-000000000000, lookup_type: ById(001de000-4401-4000-8000-000000000000) }, key: 001de000-074c-4000-8000-000000000000, lookup_type: ByName("oxide-services") }, key: 001de000-c470-4000-8000-000000000002, lookup_type: ByName("nexus") } resource = AddressLot { parent: Fleet, key: a77d92ed-7ff7-4794-9411-257076800abe, lookup_type: ByName("initial-infra") } resource = LoopbackAddress { parent: Fleet, key: 440dc262-4948-4e3f-980c-c30527e582bc, lookup_type: ByCompositeId("address = V6(Ipv6Network { addr: fd00:99::1, prefix: 64 }), rack_id = 0482465f-ee67-48a7-a18f-874879408e14, switch_location = \\"switch0\\"") } error_message_external = address unavailable error_message_internal = address unavailable response_code = 400
The address unavailable error seems to be due to the attempted assignment of the anycast fd00:99::1 address to switch1 when it is already in use on switch0. This is of course fine but not understood by the current logic:
address unavailable
fd00:99::1
root@[fd00:1122:3344:108::3]:32221/omicron> select first_address, last_address from address_lot_block; first_address | last_address ----------------+---------------- 172.20.15.21 | 172.20.15.22 fd00:99::1 | fd00:99::ffff (2 rows) Time: 2ms total (execution 2ms / network 0ms) root@[fd00:1122:3344:108::3]:32221/omicron> select first_address, last_address from address_lot_rsvd_block; first_address | last_address ----------------+--------------- fd00:99::1 | fd00:99::1 172.20.15.21 | 172.20.15.21
I manually deleted the fd00:99::1 address from the reserved block table after which RSS completed.
Closed by https://github.com/oxidecomputer/omicron/pull/3626
During dogfood mupdate today, the final nexus handoff failed:
and the nexus logs contained:
The
address unavailable
error seems to be due to the attempted assignment of the anycastfd00:99::1
address to switch1 when it is already in use on switch0. This is of course fine but not understood by the current logic:I manually deleted the
fd00:99::1
address from the reserved block table after which RSS completed.