oxidecomputer / helios

Helios: Or, a Vision in a Dream. A Fragment.
Mozilla Public License 2.0
371 stars 10 forks source link

image: include dumpadm.conf that disables running savecore on boot #133

Closed citrus-it closed 11 months ago

citrus-it commented 11 months ago

There are a couple of pieces here, but we generally don't want to have savecore run automatically on a gimlet, and the control plane immediately configures things so that we don't before selecting a dump device and a place to extract any saved dump, and then running savecore itself.

However, once https://github.com/oxidecomputer/stlouis/issues/378 is integrated, the system can boot up with a dump device configured. In this case, the dumpadm service attempts to save any panic in that device to the ramdisk with predictable results.

savecore: System dump time: Sun Dec 28 00:11:06 1986

savecore: not enough space in /var/crash/gimlet-sn06 (3659 MB avail, 11665 MB needed)

Shipping this partial dumpadm.conf prevents this, and the rest of the content is filled in at boot by the dumpadm service to reflect the system state.

savecore: System dump time: Sun Dec 28 00:11:06 1986

savecore: Panic crashdump pending on dump device but dumpadm -n in effect; run savecore(8) manually to extract.
gimlet-sn06 # cat /etc/dumpadm.conf
#
# dumpadm.conf
#
# Configuration parameters for system crash dump.
# Do NOT edit this file by hand -- use dumpadm(8) instead.
#
DUMPADM_DEVICE=/devices/pci@38,0/pci1de,fff9@3,3/pci1344,3100@0/blkdev@w00A0750130083A19,0:e
DUMPADM_SAVDIR=/var/crash/gimlet-sn06
DUMPADM_CONTENT=curproc
DUMPADM_ENABLE=no
DUMPADM_CSAVE=on
jclulow commented 11 months ago

Did you consider just disabling the dumpadm service in the SMF profile we use to construct the image?

citrus-it commented 11 months ago

Did you consider just disabling the dumpadm service in the SMF profile we use to construct the image?

Considered it, but fmd depends on it, and I prefer letting it do the right thing in this case. That includes, as it turns out, setting the system dump/instance UUID.

I think this is all right, but I haven't been able to find all the answers I was hoping for in the control plane.

It's worth noting that crash dump handling in omicron is currently being reworked and I'm talking to @lifning in case this and https://github.com/oxidecomputer/stlouis/issues/378 has any impact there.

There are a couple of surprising things in the sled agent code. I was expecting it to just set up a dump device on the M.2 that was booted from, and then run savecore once it had determined a good location to put the dump. Instead it does a dance trying to find an empty dump device on either M.2, and running savecore before configuring the dump device. I don't think most of this is necessary given the guarantees that dumpadm and savecore provide, and we could certainly extend those utilities if more is required.

What I have not been able to figure out is how any of this works in the recovery image. Your addition here would (it appears to me) apply to that image as well, which is good. But I couldn't find the code that's responsible for doing dump device setup or running savecore (if it even makes sense to run savecore in that case?).

stlouis#378 only currently applies to booting from disk, not the recover/mupdate path (IPCC) nor network boot (lab/manufacturing). It would be possible to search for any M.2 which has a slice 4 in the recovery case and do the same setup there, but it isn't in the current patch set.

wesolows commented 11 months ago

stlouis#378 only currently applies to booting from disk, not the recover/mupdate path (IPCC) nor network boot (lab/manufacturing). It would be possible to search for any M.2 which has a slice 4 in the recovery case and do the same setup there, but it isn't in the current patch set.

Right, the case I was thinking about might look like this:

10 Select normal phase 2 20 Configure dump device 30 Start booting 40 Panic before arriving at userland 50 goto 10

After several iterations of that, the operator triggers recovery. The only other supported operation is to lower the sled's power state to A2 or A3, but let's assume that the same loop will resume the next time it's raised to A0 and eventually we're going to do recovery. When we get to the recovery image, we do ???. My thought is not about configuring early dump in that case (while we do in fact know exactly where to put it, we're still maintaining the fiction that sled-agent controls all the storage), it's about saving and recovering the dump we already have. That can happen "now", where the recovery process runs savecore (putting it where though?) and then PUTting it to some HTTP server, or it can happen "later", but only if we don't trash the contents in some way -- like by changing the SSD's format or the definition of slice 4's starting and ending LBAs.

There is a bigger architectural principle we might want to consider here, on account of the fact that sled-agent is basically orphaned, is going to be forever a huge maintenance and improvement burden on account of the need to write hundreds of bindings for all the system libraries it needs to use, and that sled-agent runs so late. Eventually sled-agent is either going to be rewritten in C or broken into pieces, some of which will be rewritten in C. There's no practical alternative, and we've now bound the kernel to the layout of the non-serviceable storage devices; if upgrade across such a change is even possible it's going to be nasty. For all these reasons, we might want to decide that the format of the non-serviceable storage devices is owned by the OS. That would also mean moving creation/update/removal of any disk labels on those devices out of sled-agent and into an OS-delivered service that would be delivered with, and coordinated with, the kernel. This doesn't have to be done now, but it's something that this work has definitely brought into focus for me and to think about. In the meantime, I'm still vaguely obsessed with not losing dumps, even though there's still no supported way for us to get them and I have no idea when there might be.