kata-containers / kata-containers

Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs. https://katacontainers.io/
Apache License 2.0
5.46k stars 1.06k forks source link

Kata agent fails to connect to D-Bus #6677

Closed Vlad1mir-D closed 1 year ago

Vlad1mir-D commented 1 year ago

When the guest image has a systemd as an init (as created with dracut), the agent terminates and the following error is returned:

Caused by:
    0: Establishing a D-Bus connection
    1: I/O error: Connection reset by peer (os error 104)
    2: Connection reset by peer (os error 104): unknown.

Further information:

Vlad1mir-D commented 1 year ago

@gkurz I will highly appreciate your assistance as you are the author of https://github.com/kata-containers/kata-containers/pull/6658

gkurz commented 1 year ago

@gkurz I will highly appreciate your assistance as you are the author of #6658

So indeed, as you mentioned in #6658, the systemd customization happens in two places : prepare_overlay() and setup_rootfs().

The former was added by 2f55017fea89d1944ec304b2040149ee9940d959. It is only used when rootfs.sh is passed a pre-existing rootfs instead of a DISTRO argument, which is the case I'm interested in (openshift).

The latter was added by #4987 but I'm now realizing that it is always called, thus re-doing the same change as in prepare_overlay() as visible below.

$ sudo DEBUG=1 ./rootfs-builder/rootfs.sh -o '"rhel"-osbuilder-version-unknown' -r /tmp/kata-dracut-rootfs-xJV3Z1  |& egrep 'prepare_overlay|setup_rootfs|wants'
+ prepare_overlay
+ mkdir -p ./etc/systemd/system/basic.target.wants/
+ ln -sf /usr/lib/systemd/system/kata-containers.target ./etc/systemd/system/basic.target.wants/kata-containers.target
+ mkdir -p ./etc/systemd/system/kata-containers.target.wants/
+ ln -sf /usr/lib/systemd/system/dbus.socket ./etc/systemd/system/kata-containers.target.wants/dbus.socket
+ setup_rootfs
+ mkdir -p /tmp/kata-dracut-rootfs-xJV3Z1/etc/systemd/system/basic.target.wants
+ ln -sf /usr/lib/systemd/system/kata-containers.target /tmp/kata-dracut-rootfs-xJV3Z1/etc/systemd/system/basic.target.wants/kata-containers.target

The correct fix is to stop wiring this in prepare_overlay() and do it in setup_rootfs() instead.

Are you ok with creating a PR for this @Vlad1mir-D ? I'll make sure it gets merged ASAP.

Vlad1mir-D commented 1 year ago

@gkurz Adding

mkdir -p "${ROOTFS_DIR}/etc/systemd/system/kata-containers.target.wants"
ln -sf "/usr/lib/systemd/system/dbus.socket" "${ROOTFS_DIR}/etc/systemd/system/kata-containers.target.wants/dbus.socket"

after

ln -sf "/usr/lib/systemd/system/kata-containers.target" "${ROOTFS_DIR}/etc/systemd/system/basic.target.wants/kata-containers.target"

into the setup_rootfs isn't helping to resolve this issue - it's leading to the same Connection reset by peer (os error 104) error and that's the reason why I asked for your help :)

I did check the resulting initrd and it indeed contains dracut dbus module content and dbus.socket also added into the container.target.wants so I'm not sure why kata-agent can't connect D-Bus. Most probably I'm missing something.

gkurz commented 1 year ago

ECONNRESET means that kata-agent managed to connect to something but the other end abruptly closed the connection... not exactly the same situation as with #6658.

Maybe you can try to set kernel_params = "systemd.journald.forward_to_console" in the kata configuration file and look for some hints in the guest console @Vlad1mir-D ?

Vlad1mir-D commented 1 year ago

@gkurz Thank you for your advice! It seems dbus-broken fails to launch due to some permission issue:

"[    1.906026] audit: type=1334 audit(1681819879.800:9): prog-id=8 op=LOAD"
"[    1.907189] systemd-journald[158]: Successfully sent stream file descriptor to service manager."
"[    1.916425] audit: type=1130 audit(1681819879.811:10): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dbus-broker comm=\"systemd\" exe=\"/usr/lib/systemd/systemd\" hostname=? addr=? terminal=? res=success'"
"[    1.910251] dbus-broker-launch[197]: ERROR launcher_run_child @ ../src/launch/launcher.c +325: Permission denied"
"[    1.910701] dbus-broker-launch[195]: ERROR service_add @ ../src/launch/service.c +921: Transport endpoint is not connected"
"[    1.910900] dbus-broker-launch[195]:       launcher_add_services @ ../src/launch/launcher.c +804"
"[    1.910996] dbus-broker-launch[195]:       launcher_run @ ../src/launch/launcher.c +1409"
"[    1.911119] dbus-broker-launch[195]:       run @ ../src/launch/main.c +152"
"[    1.911280] dbus-broker-launch[195]:       main @ ../src/launch/main.c +178"
"[    1.911375] dbus-broker-launch[195]: Exiting due to fatal error: -107"
"[    1.923243] systemd-journald[158]: Successfully sent stream file descriptor to service manager."
"[    1.924480] dbus-broker-launch[201]: ERROR launcher_run_child @ ../src/launch/launcher.c +325: Permission denied"
"[    1.924898] dbus-broker-launch[200]: ERROR service_add @ ../src/launch/service.c +921: Transport endpoint is not connected"
"[    1.925038] dbus-broker-launch[200]:       launcher_add_services @ ../src/launch/launcher.c +804"
"[    1.925130] dbus-broker-launch[200]:       launcher_run @ ../src/launch/launcher.c +1409"
"[    1.925254] dbus-broker-launch[200]:       run @ ../src/launch/main.c +152"
"[    1.925405] dbus-broker-launch[200]:       main @ ../src/launch/main.c +178"
"[    1.925497] dbus-broker-launch[200]: Exiting due to fatal error: -107"
"[    1.937222] systemd-journald[158]: Successfully sent stream file descriptor to service manager."
"[    1.938455] dbus-broker-launch[205]: ERROR launcher_run_child @ ../src/launch/launcher.c +325: Permission denied"
"[    1.938878] dbus-broker-launch[204]: ERROR service_add @ ../src/launch/service.c +921: Transport endpoint is not connected"
"[    1.939016] dbus-broker-launch[204]:       launcher_add_services @ ../src/launch/launcher.c +804"
"[    1.939133] dbus-broker-launch[204]:       launcher_run @ ../src/launch/launcher.c +1409"
"[    1.939260] dbus-broker-launch[204]:       run @ ../src/launch/main.c +152"
"[    1.939410] dbus-broker-launch[204]:       main @ ../src/launch/main.c +178"
"[    1.939501] dbus-broker-launch[204]: Exiting due to fatal error: -107"
"[    1.951219] systemd-journald[158]: Successfully sent stream file descriptor to service manager."
"[    1.952303] dbus-broker-launch[209]: ERROR launcher_run_child @ ../src/launch/launcher.c +325: Permission denied"
"[    1.952707] dbus-broker-launch[208]: ERROR service_add @ ../src/launch/service.c +921: Transport endpoint is not connected"
"[    1.952850] dbus-broker-launch[208]:       launcher_add_services @ ../src/launch/launcher.c +804"
"[    1.952941] dbus-broker-launch[208]:       launcher_run @ ../src/launch/launcher.c +1409"
"[    1.953066] dbus-broker-launch[208]:       run @ ../src/launch/main.c +152"
"[    1.953213] dbus-broker-launch[208]:       main @ ../src/launch/main.c +178"
"[    1.953304] dbus-broker-launch[208]: Exiting due to fatal error: -107"
"[    1.964992] systemd-journald[158]: Successfully sent stream file descriptor to service manager."
"[    1.966140] dbus-broker-launch[213]: ERROR launcher_run_child @ ../src/launch/launcher.c +325: Permission denied"
"[    1.966554] dbus-broker-launch[212]: ERROR service_add @ ../src/launch/service.c +921: Transport endpoint is not connected"
"[    1.966693] dbus-broker-launch[212]:       launcher_add_services @ ../src/launch/launcher.c +804"
"[    1.966821] dbus-broker-launch[212]:       launcher_run @ ../src/launch/launcher.c +1409"
"[    1.966948] dbus-broker-launch[212]:       run @ ../src/launch/main.c +152"
"[    1.967098] dbus-broker-launch[212]:       main @ ../src/launch/main.c +178"
"[    1.967188] dbus-broker-launch[212]: Exiting due to fatal error: -107"

Full containerd output: vmconsole.txt

gkurz commented 1 year ago

@Vlad1mir-D ok, then it is likely because / has mode 0700 in the initrd. Fixed by adding chmod 755 ${ROOTFS_DIR} to rootfs.sh.

gkurz commented 1 year ago

@Vlad1mir-D Can you open a PR for this ? I suggest two commits :

  1. changes suggested in https://github.com/kata-containers/kata-containers/issues/6677#issuecomment-1512834753 as preliminary cleanup
  2. add chmod 755 "${ROOTFS_DIR}"
Vlad1mir-D commented 1 year ago

@gkurz Yes, I'm going to test if chmod helps to resolve this issue and open a PR if everything'll be fine.

Vlad1mir-D commented 1 year ago

@gkurz https://github.com/kata-containers/kata-containers/pull/6681