oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

Early networking: don't hard stop on configuration failure #7119

Closed internet-diglett closed 15 hours ago

internet-diglett commented 1 day ago

Sometimes there are errors that can occur during the configuration of an individual link or peer that should not halt the process of starting up the entire rack. Instead, we should make an effort to configure these items, log an error if one is present, then move on. The worst case scenario is ultimately the same: in the event that we don't get enough of a working configuration to start the rack, NTP will not come up, and thus the control plane will never start. However now we can proceed in the event of a partial failure or misconfiguration and possibly get the control plane running, allowing users / operators to troubleshoot and update configurations via the api.

Related

https://github.com/oxidecomputer/dendrite/issues/1048