oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

Listing blueprints in `omdb` and `reconfigurator-cli` could more helpfully sort the blueprints #6639

Closed bnaecker closed 1 month ago

bnaecker commented 1 month ago

I was testing a small change I'm making to Reconfigurator, to emit new DNS records from a blueprint. To test it, I saved the reconfigurator state from dogfood with omdb db reconfigurator-save dogfood.out from the switch zone. I then moved it my machine to test the local changes.

I wanted to see what changes to DNS would be made by my new code. So I ran this:

〉load dogfood-reconfigurator.out de70686a-6902-40ab-aef3-79f19d0d1fa4
using collection de70686a-6902-40ab-aef3-79f19d0d1fa4 as source of sled inventory data
sled 0c7011f7-a4bf-4daf-90cc-1c2410103300 loaded
sled 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa loaded
sled 5f6720b8-8a31-45f8-8c94-8e699218f28b loaded
sled 71def415-55ad-46b4-ba88-3ca55d7fb287 loaded
sled 7b862eb6-7f50-4c2f-b9a6-0d12ac913d3c loaded
sled 87c2c4fc-b0c7-4fef-a305-78f0ed265bbc loaded
sled a2adea92-b56e-44fc-8a0d-7d63b5fd3b93 loaded
sled b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88 loaded
sled bd96ef7c-4941-4729-b6f7-5f47feecbc4b loaded
sled db183874-65b5-4263-a1c1-ddb2737ae0e9 loaded
sled dd83e75a-1edf-4aa1-89a0-cd8b2091a7cd loaded
sled f15774c1-b8e5-434f-a493-ec43f96cba06 loaded
collection 0c665bd6-595a-46a5-a65b-d85f87cfcc34 loaded
collection a40ba54a-849f-46f9-8a23-b3c0d463bc1c loaded
collection de70686a-6902-40ab-aef3-79f19d0d1fa4 loaded
blueprint 0ff40c05-188e-4690-ab15-a63d737d550f loaded
blueprint 1366c843-a75c-45a4-b40a-743e7d609af1 loaded
blueprint 170395d7-517a-4272-abcd-0e96b051522c loaded
blueprint 3eb67393-bdbc-4957-98c2-36cc60e3e901 loaded
blueprint 430f5c6b-3156-4921-8ddc-74560989c8f4 loaded
blueprint 451fe6ad-a87f-4447-95da-d4d8b632d0c5 loaded
blueprint 864b1f48-68dd-478d-be0a-c3c8e811dac2 loaded
blueprint 95c3f06b-4dbf-4614-ae7c-507c1193bde9 loaded
blueprint a187d811-0037-47eb-83ab-950d374317e1 loaded
blueprint e48511bd-8d9f-47d7-8108-216b7868fdec loaded
loaded service IP pool ranges: [V4(Ipv4Range { first: 172.20.26.1, last: 172.20.26.10 })]
configured external DNS zone name: rack2.eng.oxide.computer
configured silo names: default-silo, oxide-local2, silo31, oxide, oxide-local, quota-test, test, recovery, silo12, silo2, silo11, now-with-quotas, silo1, silo21
internal DNS generations: 1, 2, 3, 4, 5, 6, 7
external DNS generations: 25, 26, 31, 32
loaded data from "dogfood-reconfigurator.out"

〉inventory-list
ID                                   NERRORS TIME_DONE
0c665bd6-595a-46a5-a65b-d85f87cfcc34 0       2024-09-23T18:19:06.124Z
a40ba54a-849f-46f9-8a23-b3c0d463bc1c 0       2024-09-23T18:19:40.689Z
de70686a-6902-40ab-aef3-79f19d0d1fa4 0       2024-09-23T18:19:43.414Z
〉blueprint-list
ID
0ff40c05-188e-4690-ab15-a63d737d550f
1366c843-a75c-45a4-b40a-743e7d609af1
170395d7-517a-4272-abcd-0e96b051522c
3eb67393-bdbc-4957-98c2-36cc60e3e901
430f5c6b-3156-4921-8ddc-74560989c8f4
451fe6ad-a87f-4447-95da-d4d8b632d0c5
864b1f48-68dd-478d-be0a-c3c8e811dac2
95c3f06b-4dbf-4614-ae7c-507c1193bde9
a187d811-0037-47eb-83ab-950d374317e1
e48511bd-8d9f-47d7-8108-216b7868fdec
〉blueprint-plan e48511bd-8d9f-47d7-8108-216b7868fdec de70686a-6902-40ab-aef3-79f19d0d1fa4
error: generating blueprint: invariant violation: found decommissioned sled with 1 non-expunged zones: 1efda86b-caef-489f-9792-589d7677e59a
Sep 23 19:04:15.411 DEBG sled has no zones that need expungement; skipping, sled_id: 0c7011f7-a4bf-4daf-90cc-1c2410103300, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                               Sep 23 19:04:15.412 DEBG sled has no zones that need expungement; skipping, sled_id: 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                      Sep 23 19:04:15.412 DEBG sled has no zones that need expungement; skipping, sled_id: 5f6720b8-8a31-45f8-8c94-8e699218f28b, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                                                                     Sep 23 19:04:15.412 DEBG sled has no zones that need expungement; skipping, sled_id: 71def415-55ad-46b4-ba88-3ca55d7fb287, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                            Sep 23 19:04:15.412 DEBG sled has no zones that need expungement; skipping, sled_id: 7b862eb6-7f50-4c2f-b9a6-0d12ac913d3c, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                   Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: 87c2c4fc-b0c7-4fef-a305-78f0ed265bbc, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                                                                  Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: a2adea92-b56e-44fc-8a0d-7d63b5fd3b93, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                         Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: bd96ef7c-4941-4729-b6f7-5f47feecbc4b, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                                                               Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: db183874-65b5-4263-a1c1-ddb2737ae0e9, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                                                                      Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: dd83e75a-1edf-4aa1-89a0-cd8b2091a7cd, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder
                             Sep 23 19:04:15.425 DEBG sled has no zones that need expungement; skipping, sled_id: f15774c1-b8e5-434f-a493-ec43f96cba06, parent_id: e48511bd-8d9f-47d7-8108-216b7868fdec, component: BlueprintBuilder

In that last output, we can see that the blueprint planner failed one its invariant checks. The inventory appears to have a sled that has been decommissioned, but the blueprint itself shows that there is a zone that's supposed to be running on that sled. That shouldn't happen.

The actual error here is that the blueprint I was using is out of date! The blueprint-list command shows things by ID only, without any other information, such as parent blueprint ID or date. Here is the output from omdb nexus blueprints list:

root@oxz_switch0:~# omdb nexus blueprints list
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:104::3]:12221
T ENA ID                                   PARENT                               TIME_CREATED
      00756b7a-a563-4061-9ece-8ffc692a2d28 3eb67393-bdbc-4957-98c2-36cc60e3e901 2024-09-23T18:41:36.196Z
      0ff40c05-188e-4690-ab15-a63d737d550f 95c3f06b-4dbf-4614-ae7c-507c1193bde9 2024-04-10T21:08:55.829Z
      1366c843-a75c-45a4-b40a-743e7d609af1 3eb67393-bdbc-4957-98c2-36cc60e3e901 2024-09-09T16:24:53.070Z
      170395d7-517a-4272-abcd-0e96b051522c 430f5c6b-3156-4921-8ddc-74560989c8f4 2024-07-07T21:39:25.997Z
* yes 3eb67393-bdbc-4957-98c2-36cc60e3e901 430f5c6b-3156-4921-8ddc-74560989c8f4 2024-09-06T21:42:08.133Z
      430f5c6b-3156-4921-8ddc-74560989c8f4 864b1f48-68dd-478d-be0a-c3c8e811dac2 2024-06-06T17:19:30.827Z
      451fe6ad-a87f-4447-95da-d4d8b632d0c5 a187d811-0037-47eb-83ab-950d374317e1 2024-04-18T23:10:10.454Z
      864b1f48-68dd-478d-be0a-c3c8e811dac2 e48511bd-8d9f-47d7-8108-216b7868fdec 2024-04-22T20:34:19.369Z
      95c3f06b-4dbf-4614-ae7c-507c1193bde9 <none>                               2024-03-22T18:37:17.291Z
      a187d811-0037-47eb-83ab-950d374317e1 0ff40c05-188e-4690-ab15-a63d737d550f 2024-04-18T23:03:37.348Z
      e48511bd-8d9f-47d7-8108-216b7868fdec 451fe6ad-a87f-4447-95da-d4d8b632d0c5 2024-04-22T20:23:48.417Z

So that shows that the blueprint I was trying to use, e485... is from April! It would be very nice if the reconfigurator-cli's blueprint-list subcommand showed similar output to that of omdb nexus blueprints list, including time, target state, and parent ID.

The output here is better, but it's also a bit confusing. It shows the blueprints ordered by primary key, which isn't very actionable. Instead, we should probably show them sorted by time and / or by "lineage", the history of the target blueprints. These could be different if one creates a blueprint but never sets it as the target, abandoning it instead.

One option is to show two tables: the list of all blueprints, sorted by time, and the history of target blueprints only. Another option is to show them always sorted by time, but to make visually distinct the abandoned blueprints, either with a special column or some different text or indentation level.