uapi-group / specifications

UAPI Group Specifications
https://uapi-group.org/specifications/
68 stars 18 forks source link

Standardize forwarding crashes to containers #102

Open bdrung opened 1 week ago

bdrung commented 1 week ago

There are different crash dump handler like systemd-coredump and Apport available. In case a process crashes inside a container, the crash dump handler on the host receives the crash and needs to forward the crash into the container. This crash forwarding works if the same handler is present on the host and in the container (e.g. systemd-coredump on the host and systemd-coredump in the container or Apport on the host and Apport in the container). If the crash dump handler in the container differs from the handler on the host, the forwarding will not work (see systemd-coredump handler does not forward the crash to the container for example).

To make forwarding crashes work in all different scenarios, please standardize the way of forwarding crashes to containers. I suggest to specify the location of a socket in the container and how the needed information (like crashed process ID) is sent to the socket.

bluca commented 1 week ago

needs to forward the crash into the container

Citation needed. In most cases, that's actually not true, the container might not even exist anymore when the crashdump is received.

bdrung commented 1 week ago

needs to forward the crash into the container

Citation needed. In most cases, that's actually not true, the container might not even exist anymore when the crashdump is received.

One example: The test case mentioned in the bug description in https://bugs.launchpad.net/ubuntu/+source/apport/+bug/2063349. Another example: autopkgtest runners on Ubuntu armhf. Do you have examples where containers are destroyed when one process crashes inside?

bluca commented 1 week ago

Anything single process, and anything that is closed or upgraded after the crash. Containers are ephemeral and volatile by definition. What's the point in doing this forwarding at all?

schopin-pro commented 1 week ago

Without getting into the weeds of why this is useful, one could just note that there are at least two crash dump handlers that grew that capability independently(apport and systemd-coredump), which IMHO is a good indication that there is an actual need here.

I agree that the single-process container is a common pattern, but it's not the only use case for containers. So, assuming the container survives the crash and has a crash handler installed, it will get much more out of the crash dump than the host's handler, since it knows about the details of the containers. Think running a Ubuntu container on a Fedora host.

bluca commented 1 week ago

Think running a Ubuntu container on a Fedora host.

Have you seen https://systemd.io/ELF_PACKAGE_METADATA/ ? I should probably move that here.

I've already mentioned this to @enr0n please consider enabling that spec distro-wide in Ubuntu, so that the host can get all the information from a crash in the guest without any need for communication, but simply by parsing the core file. Fedora already implements it, so if you try the opposite (crash fedora guest in ubuntu host) coredumpctl on the host will give you at lot of info.

In fact several packages in Debian/Ubuntu already use it, including all systemd ones, so if you crash any of those they'll already contain the info. This is done on a package-by-package opt-in basis, for a distro-wide debhelper change see: https://salsa.debian.org/debian/debhelper/-/merge_requests/98 (unfortunately going nowhere in Debian due to dpkg politics, but this shouldn't be a problem for you)

bdrung commented 1 week ago

I have seen https://systemd.io/ELF_PACKAGE_METADATA/ and it has been on my todo wish list for a long time. Thanks for the pointer to https://salsa.debian.org/debian/debhelper/-/merge_requests/98. I'll read the discussion there. If there are no technical reasons against the proposed implementation, we could carry this delta in Ubuntu to add the ELF metadata by default.

bluca commented 1 week ago

That would be very nice, thanks

bdrung commented 1 week ago

I read enough for today. @bluca since you submitted https://salsa.debian.org/debian/debhelper/-/merge_requests/98 are you willing to submit against dpkg-buildflags? Since we are already carrying some changes for dpkg in Ubuntu, I doubt that this additional delta will be a problem.

bluca commented 1 week ago

Yes I can look into that in the next few days

bluca commented 1 week ago

@bdrung here's a PR: https://code.launchpad.net/~bluca/ubuntu/+source/dpkg/+git/dpkg/+merge/465957 tested with a package build on noble, seems to work as intended