rhinstaller / kickstart-tests

Kickstart-driven tests for the Anaconda installer
56 stars 36 forks source link

rhel-10 and daily-iso installation timeout flakes with "WARNING gnome-kiosk:Lost or failed to acquire name org.gnome.Mutter.ServiceChannel" #1303

Open rvykydal opened 2 months ago

rvykydal commented 2 months ago

First spotted: 09-04-2024

23:51:14,601 WARNING org.fedoraproject.Anaconda.Modules.Payloads:INFO:anaconda.modules.payloads.payload.dnf.transaction_progress:Installed: libevent-2.1.12-15.el10.x86_64 1723712225 0f37d2cea587d7d18c87e3d8182c44163969d84593d860a92189021425f030a6
23:51:14,653 WARNING gnome-kiosk:Lost or failed to acquire name org.gnome.Mutter.ServiceChannel
23:51:14,654 INFO dbus-broker:Dispatched 433 messages @ 9(±47)μs / message.
23:51:14,655 NOTICE anaconda:Error reading events from display: Broken pipe
23:51:14,698 WARNING org.fedoraproject.Anaconda.Modules.Payloads:INFO:anaconda.modules.payloads.payload.dnf.transaction_progress:Installed: openldap-2.6.7-5.el10.x86_64 1723815428 8b4f97e9b447e0356945efbce68c5901ad2120a47653b32ae76e73d815d9462b

kstest.log virt-install.log

Other instances: virt-install.log virt-install.log virt-install.log virt-install.log virt-install.log

rvykydal commented 2 months ago

2 occurences the last week (~1/750), 6 occurences the week ago (~1/250)

rvykydal commented 1 month ago

After porting the Wayland migration to Fedora (https://github.com/rhinstaller/anaconda/pull/5829) we are seeing the flake also on daily-iso.

jkonecny12 commented 1 month ago

@jexposit any idea what could be causing this?

jexposit commented 1 month ago

Same error messages as in https://github.com/rhinstaller/kickstart-tests/issues/1261:

23:49:31,397 INFO systemd:Starting udisks2.service - Disk Manager...
23:49:31,478 NOTICE udisksd:udisks daemon version 2.10.90 starting
23:49:31,507 WARNING udisksd:Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr0': Unexpected sense data returned:#0120000: f0 00 05 00  58 00 01 0a  00 00 00 00  21 04 00 00    ....X.......!...#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0)
23:49:31,508 WARNING udisksd:Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr1': Unexpected sense data returned:#0120000: f0 00 05 00  58 00 01 0a  00 00 00 00  21 04 00 00    ....X.......!...#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0)
23:49:31,515 INFO systemd:Started udisks2.service - Disk Manager.
23:49:31,518 NOTICE udisksd:Acquired the name org.freedesktop.UDisks2 on the system message bus
23:49:31,546 NOTICE udisksd:Loading module iscsi ...
23:49:31,551 WARNING udisksd:Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr0': Unexpected sense data returned:#0120000: f0 00 05 00  58 00 01 0a  00 00 00 00  21 04 00 00    ....X.......!...#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0)
23:49:31,552 WARNING udisksd:Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr1': Unexpected sense data returned:#0120000: f0 00 05 00  58 00 01 0a  00 00 00 00  21 04 00 00    ....X.......!...#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0)
23:49:31,560 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:blivet:failed to get initiator name from iscsi firmware: Failed to call GetFirmwareInitiatorName method on /org/freedesktop/UDisks2/Manager with None arguments: GDBus.Error:org.freedesktop.UDisks2.Error.ISCSI.NoFirmware: No firmware found

Are those expected?

In addition, I see:

23:49:49,234 ERR run-in-new-session:libEGL warning: egl: failed to create dri2 screen
23:49:49,239 ERR run-in-new-session:libEGL warning: egl: failed to create dri2 screen
23:49:49,240 ERR run-in-new-session:libEGL warning: NEEDS EXTENSION: falling back to kms_swrast
23:49:49,244 NOTICE gnome-kiosk:Added device '/dev/dri/card0' (virtio_gpu) using atomic mode setting.
23:49:49,248 NOTICE gnome-kiosk:Failed to initialize accelerated iGPU/dGPU framebuffer sharing: Not hardware accelerated
23:49:49,248 NOTICE gnome-kiosk:Created gbm renderer for '/dev/dri/card0'
23:49:49,248 NOTICE gnome-kiosk:Boot VGA GPU /dev/dri/card0 selected as primary
23:49:49,601 NOTICE gnome-kiosk:Using Wayland display name 'wl-sysinstall-0'
23:49:49,602 ERR run-in-new-session:Window manager warning: Failed to set environment variable WAYLAND_DISPLAY for gnome-session: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Destination does not exist

The first 3 ERR messages are fine, Mutter/GNOME Kioks fallbacks to software rendering (falling back to kms_swrast).

The Failed to set environment variable... error is printed by Mutter here: https://gitlab.gnome.org/GNOME/mutter/-/blob/main/src/wayland/meta-wayland.c?ref_type=heads#L642

It looks like it tries to call org.gnome.SessionManager (D-Bus). @M4rtinK as far as I know, GNOME Session Manager is not used in Anaconda? I tested it with:

$ XDG_RUNTIME_DIR=/tmp DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/bus dbus-send --dest=org.gnome.SessionManager --print-reply --type=method_call /org/gnome/SessionManager org.gnome.SessionManager.GetClients
Error org.freedesktop.DBus.Error.ServiceUnknown: The name is not activatable

I don't think that Mutter warning should be treated a fatal. Ideas?

jkonecny12 commented 1 month ago

@jexposit what I wonder more is this part:

23:51:14,601 WARNING org.fedoraproject.Anaconda.Modules.Payloads:INFO:anaconda.modules.payloads.payload.dnf.transaction_progress:Installed: libevent-2.1.12-15.el10.x86_64 1723712225 0f37d2cea587d7d18c87e3d8182c44163969d84593d860a92189021425f030a6
23:51:14,653 WARNING gnome-kiosk:Lost or failed to acquire name org.gnome.Mutter.ServiceChannel
23:51:14,654 INFO dbus-broker:Dispatched 433 messages @ 9(±47)μs / message.
23:51:14,655 NOTICE anaconda:Error reading events from display: Broken pipe
23:51:14,698 WARNING org.fedoraproject.Anaconda.Modules.Payloads:INFO:anaconda.modules.payloads.payload.dnf.transaction_progress:Installed: openldap-2.6.7-5.el10.x86_64 1723815428 8b4f97e9b447e0356945efbce68c5901ad2120a47653b32ae76e73d815d9462b
23:51:14,720 WARNING kernel:kauditd_printk_skb: 9 callbacks suppressed
23:51:14,720 NOTICE kernel:audit: type=1107 audit(1725407474.716:260): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='avc:  denied  { stop } for auid=n/a uid=0 gid=0 path="/run/systemd/transient/session-1.scope" cmdline="/usr/lib/systemd/systemd-logind" function="bus_scope_method_abandon" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=1 exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?'
23:51:14,725 INFO systemd-logind:Session 1 logged out. Waiting for processes to exit.
23:51:14,782 DEBUG NetworkManager:<debug> [1725407474.7822] agent-manager: agent[18486805bba372ea,:1.43/anaconda/0]: agent unregistered or disappeared
23:51:14,803 INFO dbus-daemon:Reloaded configuration
23:51:15,107 INFO systemd:session-1.scope: Deactivated successfully.
23:51:15,115 INFO systemd-logind:Removed session 1.

Log: https://github.com/user-attachments/files/17011201/virt-install.log

Seems to me that in the middle of the installation (long time after Anaconda is running) Wayland is crashing and the installation just stuck on that place. There are few more logs after this but these are just finish tasks from payload.

jexposit commented 1 month ago

Thanks for pointing out to the relevant traces, I'm not familiar with Anaconda's logs and I was grepping for errors.

The only GNOME Kiosk/mutter log I see is:

23:51:14,653 WARNING gnome-kiosk:Lost or failed to acquire name org.gnome.Mutter.ServiceChannel

That warning is printed when mutter losses org.gnome.Mutter.ServiceChannel.

Any chance you can ssh into the failed installation and get a coredump?

Also, if you create a issue upstream (in mutter) you might get more help from mutter devs.