openbmc / phosphor-objmgr

Phosphor Object Manager
Apache License 2.0
6 stars 7 forks source link

intermittent "mapper[279]: Error starting event loop: Connection timed out" seen #18

Open geissonator opened 4 years ago

geissonator commented 4 years ago

Our Hardware CI runs about 20-30 jobs a day test openbmc master gerrit jobs. Approximately 5-10% of our runs fail with this issue. It used to be more intermittent but has started to ramp up. We've always had performance issues in the early startup of the BMC. The removal of python helped but then new functions keep getting added (ssl key generation, certificates, ldap, ...). My theory is that this intermittent issue is related to performance.

Mar 23 23:19:40 w54 systemd[1]: Starting Phosphor DBus Service Discovery Manager...
...
Mar 23 23:19:47 w54 systemd[1]: Started Phosphor DBus Service Discovery Manager.
Mar 23 23:19:47 w54 systemd[1]: Reached target Phosphor Object Mapper.
Mar 23 23:19:48 w54 systemd[1]: Starting Wait for /org/openbmc/control/power0...
Mar 23 23:19:48 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/auto_reboot...
Mar 23 23:19:48 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/boot/one_time...
Mar 23 23:19:48 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/boot...
Mar 23 23:19:48 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/power_restore_policy...
Mar 23 23:19:49 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/restriction_mode...
Mar 23 23:19:49 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0...
Mar 23 23:19:49 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1...
Mar 23 23:19:49 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/led/groups/bmc_booted...
...
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): Mounting in unauthenticated mode
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): background thread "ubifs_bgt4_1" started, PID 418
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): UBIFS: mounted UBI device 4, volume 1, name "rwfs"
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): LEB size: 65408 bytes (63 KiB), min./max. I/O unit sizes: 8 bytes/256 bytes
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): FS size: 5690496 bytes (5 MiB, 87 LEBs), journal size 523265 bytes (0 MiB, 6 LEBs)
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): reserved for root: 268775 bytes (262 KiB)
Mar 23 23:20:18 w54 kernel: UBIFS (ubi4:1): media format: w5/r0 (latest is w5/r0), UUID F4E49844-9A57-4BD1-BBE1-1EB64CD9A050, small LPT model
Mar 23 23:20:18 w54 dropbear[365]: Exit before auth: Exited normally
Mar 23 23:20:18 w54 systemd[1]: dropbear@1-9.3.86.161:22-9.41.165.94:42252.service: Succeeded.
Mar 23 23:20:18 w54 systemd[1]: obmc-flash-bmc-ubiremount.service: Succeeded.
Mar 23 23:20:18 w54 systemd[1]: Started Remount the ubi volume after a reboot.
Mar 23 23:20:19 w54 systemd-networkd[376]: eth0: Configured
Mar 23 23:20:19 w54 avahi-daemon[265]: Service "obmc_redfish #14" (/services/obmc_redfish.service) successfully established.
Mar 23 23:20:20 w54 bmcweb[174]: pam_succeed_if(webserver:auth): requirement "user ingroup redfish" was met by user "root"
Mar 23 23:20:21 w54 mapper[279]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-boot-one_time.service: Control process exited, code=exited, status=1/FAILURE
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-boot-one_time.service: Failed with result 'exit-code'.
Mar 23 23:20:21 w54 systemd[1]: Failed to start Wait for /xyz/openbmc_project/control/host0/boot/one_time.
Mar 23 23:20:21 w54 mapper[274]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-auto_reboot.service: Control process exited, code=exited, status=1/FAILURE
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-auto_reboot.service: Failed with result 'exit-code'.
Mar 23 23:20:21 w54 systemd[1]: Failed to start Wait for /xyz/openbmc_project/control/host0/auto_reboot.
Mar 23 23:20:21 w54 mapper[272]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[294]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-org-openbmc-control-power0.service: Control process exited, code=exited, status=1/FAILURE
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-org-openbmc-control-power0.service: Failed with result 'exit-code'.
Mar 23 23:20:21 w54 systemd[1]: Failed to start Wait for /org/openbmc/control/power0.
Mar 23 23:20:21 w54 mapper[305]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[306]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[309]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[311]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-network-eth0.service: Control process exited, code=exited, status=1/FAILURE
Mar 23 23:20:21 w54 mapper[290]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[289]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[285]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-network-eth0.service: Failed with result 'exit-code'.
Mar 23 23:20:21 w54 systemd[1]: Failed to start Wait for /xyz/openbmc_project/network/eth0.
Mar 23 23:20:21 w54 mapper[283]: Error starting event loop: Connection timed out
Mar 23 23:20:21 w54 mapper[282]: Error starting event loop: Connection timed out
Mar 23 23:20:22 w54 mapper[281]: Error starting event loop: Connection timed out
Mar 23 23:20:22 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-boot.service: Control process exited, code=exited, status=1/FAILURE
Mar 23 23:20:22 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-boot.service: Failed with result 'exit-code'.
Mar 23 23:20:22 w54 systemd[1]: Failed to start Wait for /xyz/openbmc_project/control/host0/boot.
Mar 23 23:20:22 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-power_restore_policy.service: Control process exited, code=exited, status=1/FAILURE
Mar 23 23:20:22 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-power_restore_policy.service: Failed with result 'exit-code'.
...
Mar 23 23:20:24 w54 systemd[1]: Stopped Wait for /org/openbmc/control/power0.
Mar 23 23:20:24 w54 systemd[1]: Starting Wait for /org/openbmc/control/power0...
Mar 23 23:20:24 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/control/host0/auto_reboot.
Mar 23 23:20:24 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/auto_reboot...
Mar 23 23:20:24 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/control/host0/boot/one_time.
Mar 23 23:20:24 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/boot/one_time...
Mar 23 23:20:24 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/control/host0/boot.
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/boot...
Mar 23 23:20:25 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/control/host0/power_restore_policy.
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/power_restore_policy...
Mar 23 23:20:25 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/control/host0/restriction_mode.
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/restriction_mode...
Mar 23 23:20:25 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0.
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply0...
Mar 23 23:20:25 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1.
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/inventory/system/chassis/motherboard/powersupply1...
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/inventory...
Mar 23 23:20:25 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/led/groups/bmc_booted.
Mar 23 23:20:25 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/led/groups/bmc_booted...
Mar 23 23:20:25 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/network/eth0.
Mar 23 23:20:26 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/network/eth0...
Mar 23 23:20:26 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/state/chassis0.
Mar 23 23:20:26 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/state/chassis0...
Mar 23 23:20:26 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/state/host0.
Mar 23 23:20:26 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/state/host0...
Mar 23 23:20:26 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/time/owner.
Mar 23 23:20:26 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/time/owner...
Mar 23 23:20:27 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-time-sync_method.service: Service RestartSec=1s expired, scheduling restart.
Mar 23 23:20:27 w54 systemd[1]: mapper-wait@-xyz-openbmc_project-time-sync_method.service: Scheduled restart job, restart counter is at 1.
Mar 23 23:20:27 w54 systemd[1]: Stopped Wait for /xyz/openbmc_project/time/sync_method.
Mar 23 23:20:27 w54 systemd[1]: Starting Wait for /xyz/openbmc_project/time/sync_method...
Mar 23 23:20:31 w54 button-handler[368]: Starting power button handler
Mar 23 23:20:32 w54 phosphor-network-manager[337]: Refreshing the objects.
Mar 23 23:20:32 w54 phosphor-network-manager[337]: Refreshing complete.
Mar 23 23:20:36 w54 systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/auto_reboot.
Mar 23 23:20:36 w54 systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/boot/one_time.
Mar 23 23:20:36 w54 systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/boot.
Mar 23 23:20:36 w54 systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/power_restore_policy.
Mar 23 23:20:36 w54 systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/restriction_mode.
Mar 23 23:20:36 w54 systemd[1]: Started Wait for /xyz/openbmc_project/network/eth0.
Mar 23 23:20:37 w54 systemd[1]: Started Wait for /xyz/openbmc_project/time/owner.
Mar 23 23:20:37 w54 systemd[1]: Started Wait for /xyz/openbmc_project/time/sync_method.
Mar 23 23:20:37 w54 systemd[1]: Starting Phosphor Inband IPMI...
Mar 23 23:20:39 w54 systemd[1]: Started Wait for /xyz/openbmc_project/state/chassis0.
Mar 23 23:20:39 w54 systemd[1]: Started Wait for /xyz/openbmc_project/inventory.
Mar 23 23:20:40 w54 systemd[1]: Started Wait for /xyz/openbmc_project/led/groups.
Mar 23 23:20:40 w54 systemd[1]: Started Wait for /org/openbmc/control/power0.

The data shows you get 33 seconds for "r = sd_event_loop(loop);" call to return in mapper wait. That times out for all the initial mapper waits but then on the subsequent retry it works within 16 seconds.

I'm thinking about adding a retry to the sd_event_loop(loop); call on a timeout ERRNO.

bradbishop commented 4 years ago

Another way to help here would be to get rid of all the mapper-wait processes (move them to the application that needs those objects).