openbmc / phosphor-host-ipmid

dbus-based ipmid for host-endpoint IPMI commands
Apache License 2.0
37 stars 74 forks source link

phosphor-ipmi-host.service intermittently failing in openbmc CI #188

Open geissonator opened 1 year ago

geissonator commented 1 year ago

If you go look at https://jenkins.openbmc.org/job/CI-MISC/job/run-ci-in-qemu/, you'll see a smattering of red failures. These intermittent failures popped up about a month ago. I've continued to just resubmit the failed jobs to get them through, hoping this issue would magically go away at some point, but it's not.

The symptom is the following in the journal as the BMC is booting up in QEMU:

Jan 25 16:55:07 romulus ipmid[283]: Error getting dbus names
Jan 25 16:55:07 romulus systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Jan 25 16:55:07 romulus systemd[1]: systemd-hostnamed.service: Consumed 1.278s CPU time.
Jan 25 16:55:07 romulus ipmid[283]: Registering NetFn:[0x3A], Cmd:[0xF0]
Jan 25 16:55:07 romulus ipmid[283]: Registering NetFn:[0x32], Cmd:[0x10]
Jan 25 16:55:07 romulus ipmid[283]: Registering NetFn:[0xA], Cmd:[0x12]
Jan 25 16:55:08 romulus mapperx[224]: Introspect call failed with error: generic:113, No route to host on process: xyz.openbmc_project.Ipmi.Host path: /
Jan 25 16:55:08 romulus systemd[1]: phosphor-ipmi-host.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 16:55:08 romulus systemd[1]: phosphor-ipmi-host.service: Failed with result 'exit-code'.
Jan 25 16:55:08 romulus systemd[1]: phosphor-ipmi-host.service: Consumed 1.986s CPU time.
Jan 25 16:55:09 romulus systemd[1]: phosphor-ipmi-host.service: Scheduled restart job, restart counter is at 1.

The restart after this works ok but CI logs a failure if they see something like this in the journal.

The full FFDC for a failure can be found up here: https://jenkins.openbmc.org/job/CI-MISC/job/run-ci-in-qemu/10869/artifact/logs/20230125165602376838_RedfishIpmiRedfishExtendedTestBasicCi/20230125165602376838_CheckForApplicationFailures/

geissonator commented 1 year ago

Unfortunately we're still seeing this intermittently. Another recreate up at https://jenkins.openbmc.org/job/CI-MISC/job/run-ci-in-qemu/11748/artifact/logs/20230411091246842759_RedfishIpmiRedfishExtendedTestBasicCi/20230411091246842759_CheckForApplicationFailures/20230411091246842759_BMC_journalctl_nopager.txt

Apr 11 09:12:40 romulus ipmid[287]: Error getting dbus names
Apr 11 09:12:40 romulus ipmid[287]: Registering NetFn:[0x3A], Cmd:[0xF0]
Apr 11 09:12:40 romulus ipmid[287]: Registering NetFn:[0x32], Cmd:[0x10]
Apr 11 09:12:40 romulus ipmid[287]: Registering NetFn:[0xA], Cmd:[0x12]
Apr 11 09:12:40 romulus mapperx[234]: Introspect call failed with error: generic:113, No route to host on process: xyz.openbmc_project.Ipmi.Host path: /
Apr 11 09:12:40 romulus systemd[1]: phosphor-ipmi-host.service: Main process exited, code=exited, status=1/FAILURE
Apr 11 09:12:40 romulus systemd[1]: phosphor-ipmi-host.service: Failed with result 'exit-code'.
Apr 11 09:12:40 romulus systemd[1]: phosphor-ipmi-host.service: Consumed 1.015s CPU time.
qiansunn commented 8 months ago

Is this still happening? I see similar issue as well.