Closed ahl closed 10 months ago
sled_list
if it really is just a filter on that listResultsPage
where next_page
or whatever it's called is always nullsled_add
.I agree w/ everything @david-crespo said
sled_list
return all sleds with an optional filter that if specified returns only initialized or not initialized. sled_add
with detailed docs sounds much betterWorth considering if sled_set_provision_state
follows the pattern too. Usually the action comes at the end so I'd expect it to be sled_provision_state_set
(if set is the right verb).
Adding to this:
I believe that in order to add an uninitialized sled, one needs to specify:
I guess it's not that bad because this is the output from uninitialized_sled_list
, but should all be strictly required?
Do we need all of these things? for the sled_view
operation we only need a sled ID. While we don't have that yet, do we really need all of those fields to uniquely identify the sled?
Thanks for looking at this. It makes sense to me.
I think we're suffering a bit because we have "sled" the hardware device and "sled" the software thing that the control plane knows about. By calling the second thing a sled, we have [hardware] sleds with no [software] sled id etc and that makes this whole area a little awkward. "Uninitialized sleds" aren't "sleds with some null value for some field" but a different thing altogether. This affects (1) because I'm not sure how easy it is to have a sled_list
that lists both, at least not without significantly changing it. I guess it could list the latest inventory collection instead, and "join" against the sled table if asking for initialized sleds? Might we ever want different payloads for these things?
I discussed pagination with Andrew on the original PR and we felt at the time that paginating for real would be quite a lot more work and that would not be a problem for quite a while. Fake pagination seems fine. I imagine we'll want to cleanly handle the case where we are given a page token, even though we'd never have issued one. We could just return a 400.
sled_add
seems fine.
Worth considering if sled_set_provision_state follows the pattern too. Usually the action comes at the end so I'd expect it to be sled_provision_state_set (if set is the right verb).
I think I generally use subject_verb_object
(like project_instance_create
), not verb-last. I'm not sure what we've done in most places. I think we haven't written down the style here so I imagine folks may have inferred different things.
I believe that in order to add an uninitialized sled, one needs to specify:
* baseboard * part * revision * serial number * rack ID * cubby integer
I guess it's not that bad because this is the output from
uninitialized_sled_list
, but should all be strictly required?Do we need all of these things? for the
sled_view
operation we only need a sled ID. While we don't have that yet, do we really need all of those fields to uniquely identify the sled?
I believe that either of these two would uniquely identify the sled that the user wants to add:
I think I would pick the baseboard part/serial because in principle that's what an operator has verified against some inventory sheet or something.
Are you figuring that you'd change this API to accept a new params::UninitializedSledId
type (different from views::UninitializedSled
) and that could have only the baseboard part and serial number? I think that'd work.
In terms of of the implementation: we do use the rack id in Nexus::add_sled_to_initialized_rack
in order to get the rack subnet so that we can allocate an IP subnet for this sled. Currently, we're getting the rack id from the client, who got it from uninitialized_sled_list
, which just uses self.rack_id
(on the Nexus
object). If we do the above (leave rack id out of the arguments to this API) we use self.rack_id
inside add_sled_to_initialized_rack
as well. When we actually do have support for multiple racks, the inventory system will know which rack id each baseboard is part of, and we'll be able to get the rack id that way.
1.
the sled_list
I proposed isn't the right approach. In the CLI I made this oxide system hardware sled list-uninitialized
so maybe that argues for sled_list_uninitialized
. I don't know that there's a great answer here.
2. I think we just do dummy pagination as discussed above. The benefits of this are A. SDK generation B. CLI generation C. maybe some future-proofing?
3.
add_sled_to_initialized_rack
-> sled_add
4. I think Dave proposed using part and serial number as a 2-tuple to identify the sled we want to add. That sounds good to me.
the
sled_list
I proposed isn't the right approach. In the CLI I made thisoxide system hardware sled list-uninitialized
so maybe that argues forsled_list_uninitialized
. I don't know that there's a great answer here.I think we just do dummy pagination as discussed above. The benefits of this are A. SDK generation B. CLI generation C. maybe some future-proofing?
add_sled_to_initialized_rack
->sled_add
I think Dave proposed using part and serial number as a 2-tuple to identify the sled we want to add. That sounds good to me.
Thanks for working through all the details. I'm going to start on this right meow.
@davepacheco So far in our API, we always take the revision in addition to serial and part number, since we use Baseboard
in UninitializedSled
as well as Sled
. Furthermore, sled-agent
expects revision
because that's what is defined in sled_hardware::Baseboard
. The only place we don't use revision
is in the inventory BaseboardId
. I recognize why this is and it makes sense. However, it seems silly to me to define yet another API type and then look up the revision from the SP in the latest inventory collection by grabbing the inventory rather than just passing the Baseboard
directly as returned from listing uninitialized sleds. It would be pretty odd for the revision to change in between listing and adding a sled 😄.
Does this work for you?
I'm having trouble keeping all this straight so tell me if this summary is right.
What I understood from @rmustacc was that:
And as you said:
BaseboardId
only has the model and serial number (for that reason)hw_baseboard_id
tableAnd you're saying (/I think I remember):
Sled
type is a view type that shows what we've found. It makes sense that "revision" would appear here.UninitializedSled
type is a view type that shows what we've found. It makes sense that "revision" would appear here.params
type for adding a sled. This is what you're asking about.Putting this all together:
UninitializedSled
view type as the params type, then we have the "revision" in it already and we're all set.params
type in the API just to leave out the revision, and the implementation would have to go fetch the latest inventory collection to find the revision in order to call the sled agent API.Is that all correct?
I don't think there's any practical problem with doing what you suggest. But I think this goes back to @ahl's earlier observation that it feels a little weird to include unnecessary information when we just want an identifier for the sled. It seems similar to me as if we included the number of hardware threads or DIMM slots here (which I could also imagine adding to the Sled
and UninitializedSled
views). If it's just a question of adding an extra type (which sounds like more precisely the right thing anyway) and an extra function call to fetch the inventory, I think I'd just do that? I'd probably also see if the sled agent API really needs the revision, too, or if we could get rid of it there. I don't feel strongly about this. @ahl?
What's the workflow for adding a new sled? In the console, who cares: users are going to click click around and they don't care about the API. What about the CLI? I think we're imagining something like this:
$ oxide sled list-uninitialized
...
<find the one I want>
$ oxide sled add <parameters on based on the whole or subset of the info above>
If that's the case, let's find some relatively minimal amount of data the user needs to copy and paste (some goldilocks zone of enough information to be right and be fat-finger-tolerant and not too much redundant information to be annoying).
However, it seems silly to me to define yet another API type and then look up the revision from the SP in the latest inventory collection by grabbing the inventory rather than just passing the Baseboard directly as returned from listing uninitialized sleds.
The added complexity of a new type seems less important than reducing complexity to CLI users. I don't think we're going to see a lot of customer- or third-party-built API automation around this particular workflow. This is a highly revisable decision; it's fine to just do something and figure out something better later (e.g. maybe operators will want to slide in a sled, and then just oxide sled add --part xyz --serial abc
because they're in the DC and wrote them down; or oxide sled add --rack-id xyz --cubby 12
because they're looking at the rack and the new part is right there in cubby 12!).
@davepacheco Everything you wrote was correct. Except I wasn't arguing to pass UninitializedSled
to add-sled
, but instead to use the Baseboard
type that already exists. So the only "extra" thing we'd pass in is revision
. I implemented this in https://github.com/oxidecomputer/omicron/pull/4704.
I'm fine changing the implementation to do what we said above and add a new type UninitializedSledId
that doesn't include the revision as that probably does make the most sense for the API. I think @ahl makes a great point about simplifying the CLI this way.
Unfortunately, changing the sled-agent and probably also elsewhere in Nexus where we use Baseboard
to identify a sled is a larger change. Sled-agent actually persists this information in multiple places. Now it's not that hard to deserialize into a BaseboardId
, but there is other information as well such as whether the sled is a gimlet or PC.
I also believe this may have ramifications for trust quorum. For example, if we add a sled with revision 5, and then it gets reworked, should it remain in the trust quorum even though it has been modified? Now, I realize this can be worked around by removing the sled from the trust quorum and then having to add it back in later. Maybe this is fine? What do you think @rmustacc ?
In any case, I'll go ahead and add the new type as a nexus param for this API, as that seems to be the consensus and that's what we're worried about right now.
Completed with https://github.com/oxidecomputer/omicron/pull/4704
uninitialized_sled_list
uninitialized_sled_list
should be paginated.add_sled_to_initialized_rack
Here are the current
system/hardware
operations:1. I think that
uninitialized_sled_list
may be a little out of place (but it might be fine). One option is to have a query param tosled_list
that determines whether all sleds, initialized sleds, uninitialized sleds, both/all/whatever are listed. Another is the a-grammaticalsled_uninitialized_list
.2. The operation returns an array of items; it should probably be paginated.
3.
add_sled_to_initialized_rack
is similarly different. I'd suggestsled_add_uninitialized
or perhaps simplysled_add
. Note also that this operation is/v1/system/hardware/sleds
; would it make sense foruninitialized_sled_list
to have a more consistent path for consistency with this operation on uninitialized sleds?