ibm-openbmc / dev

Product Development Project Mgmt and Tracking
16 stars 2 forks source link

FVT1060:Everest:SEL clear command not clearing the entries via IPMI #3638

Open yadlapati opened 4 months ago

yadlapati commented 4 months ago

Steps to re- create

======================

Login to Rainier system Executing the IPMI command to clear the entries:

bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel list 2>&1

Password:

   1 | 02/20/2024 | 05:22:19 | System Event #0x4a | Undetermined system hardware failure | Asserted

bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel clear 2>&1

Password:

Clearing SEL.  Please allow a few seconds to erase.

bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel list 2>&1

Password:

   1 | 02/20/2024 | 06:02:42 | System Event #0x4a | Undetermined system hardware failure | Asserted

Actual behaviour: IPMI sel clear command is not clearing the entries

Expected Behaviour: IPMI command need to clear and display has SEL has no entries.

lxwinspur commented 4 months ago

@yadlapati I have a couple of questions:

  1. Why are the times before and after executing sel clear different?

ipmitool sel elist 1 | 02/20/2024 | 05:22:19 | System Event #0x4a | Undetermined system hardware failure | Asserted ipmitool sel clear ipmitool sel elist 1 | 02/20/2024 | 06:02:42 | System Event #0x4a | Undetermined system hardware failure | Asserted

I wonder if the second sel is newly generated?

  1. When you execute the following command, can the following log be displayed?
    
    bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel clear 2>&1

Password:

Clearing SEL. Please allow a few seconds to erase.


**show journalctl log**
`journalctl -b | grep ipmi`
mzipse commented 4 months ago

@lxwinspur , passing this info along to you from our tester....

The below is newly capture data with no much time difference :

bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel list 2>&1 Password: 1 | 03/06/2024 | 06:26:25 | System Event #0x4a | Undetermined system hardware failure | Asserted 2 | 03/06/2024 | 06:28:04 | System Event #0x4a | Undetermined system hardware failure | Asserted 3 | 03/06/2024 | 06:28:22 | System Event #0x4a | Undetermined system hardware failure | Asserted 4 | 03/06/2024 | 06:29:00 | System Event #0x4a | Undetermined system hardware failure | Asserted 5 | 03/06/2024 | 06:29:00 | System Event #0x4a | Undetermined system hardware failure | Asserted 6 | 03/06/2024 | 06:29:01 | System Event #0x4a | Undetermined system hardware failure | Asserted 7 | 03/06/2024 | 06:29:01 | System Event #0x4a | Undetermined system hardware failure | Asserted 8 | 03/06/2024 | 06:29:08 | Unknown #0x49 | State Deasserted | Asserted 9 | 03/06/2024 | 06:29:08 | Unknown #0x49 | State Deasserted | Asserted a | 03/06/2024 | 06:29:08 | System Event #0x4a | Undetermined system hardware failure | Asserted b | 03/06/2024 | 06:29:08 | Unknown #0x49 | State Deasserted | Asserted c | 03/06/2024 | 06:29:08 | System Event #0x4a | Undetermined system hardware failure | Asserted d | 03/06/2024 | 06:29:09 | Unknown #0x49 | State Deasserted | Asserted e | 03/06/2024 | 06:29:09 | System Event #0x4a | Undetermined system hardware failure | Asserted f | 03/06/2024 | 06:29:09 | Unknown #0x49 | State Deasserted | Asserted 10 | 03/06/2024 | 06:29:09 | Unknown #0x49 | State Deasserted | Asserted 11 | 03/06/2024 | 06:29:09 | System Event #0x4a | Undetermined system hardware failure | Asserted 12 | 03/06/2024 | 06:29:09 | Unknown #0x49 | State Deasserted | Asserted 13 | 03/06/2024 | 06:29:10 | System Event #0x4a | Undetermined system hardware failure | Asserted 14 | 03/06/2024 | 06:29:10 | Unknown #0x49 | State Deasserted | Asserted 15 | 03/06/2024 | 06:29:10 | Unknown #0x49 | State Deasserted | Asserted 16 | 03/06/2024 | 06:29:10 | System Event #0x4a | Undetermined system hardware failure | Asserted 17 | 03/06/2024 | 06:29:10 | Unknown #0x49 | State Deasserted | Asserted 18 | 03/06/2024 | 06:29:10 | System Event #0x4a | Undetermined system hardware failure | Asserted 19 | 03/06/2024 | 06:29:10 | Unknown #0x49 | State Deasserted | Asserted 1a | 03/06/2024 | 06:29:11 | System Event #0x4a | Undetermined system hardware failure | Asserted 1b | 03/06/2024 | 06:29:11 | Unknown #0x49 | State Deasserted | Asserted 1c | 03/06/2024 | 06:29:11 | System Event #0x4a | Undetermined system hardware failure | Asserted 1d | 03/06/2024 | 06:29:11 | System Event #0x4a | Undetermined system hardware failure | Asserted 1e | 03/06/2024 | 06:29:12 | System Event #0x4a | Undetermined system hardware failure | Asserted 1f | 03/06/2024 | 06:29:12 | System Event #0x4a | Undetermined system hardware failure | Asserted 20 | 03/06/2024 | 06:29:13 | System Event #0x4a | Undetermined system hardware failure | Asserted 21 | 03/06/2024 | 06:29:14 | System Event #0x4a | Undetermined system hardware failure | Asserted 22 | 03/06/2024 | 06:29:14 | System Event #0x4a | Undetermined system hardware failure | Asserted 23 | 03/06/2024 | 06:29:14 | System Event #0x4a | Undetermined system hardware failure | Asserted 24 | 03/06/2024 | 06:30:12 | System Event #0x4a | Undetermined system hardware failure | Asserted 25 | 03/06/2024 | 06:30:15 | System Event #0x4a | Undetermined system hardware failure | Asserted 26 | 03/06/2024 | 06:31:08 | Unknown #0x49 | State Deasserted | Asserted 27 | 03/06/2024 | 06:36:41 | System Event #0x4a | Undetermined system hardware failure | Asserted 28 | 03/06/2024 | 06:36:59 | System Event #0x4a | Undetermined system hardware failure | Asserted 29 | 03/06/2024 | 06:37:45 | Unknown #0x49 | State Deasserted | Asserted 2a | 03/06/2024 | 06:37:45 | Unknown #0x49 | State Deasserted | Asserted 2b | 03/06/2024 | 06:37:45 | Unknown #0x49 | State Deasserted | Asserted 2c | 03/06/2024 | 06:37:45 | Unknown #0x49 | State Deasserted | Asserted 2d | 03/06/2024 | 06:37:46 | Unknown #0x49 | State Deasserted | Asserted 2e | 03/06/2024 | 06:37:46 | Unknown #0x49 | State Deasserted | Asserted 2f | 03/06/2024 | 06:37:46 | Unknown #0x49 | State Deasserted | Asserted 30 | 03/06/2024 | 06:37:46 | Unknown #0x49 | State Deasserted | Asserted 31 | 03/06/2024 | 06:37:46 | Unknown #0x49 | State Deasserted | Asserted 32 | 03/06/2024 | 06:37:47 | Unknown #0x49 | State Deasserted | Asserted 33 | 03/06/2024 | 06:37:47 | Unknown #0x49 | State Deasserted | Asserted 34 | 03/06/2024 | 06:37:47 | Unknown #0x49 | State Deasserted | Asserted 35 | 03/06/2024 | 06:39:45 | Unknown #0x49 | State Deasserted | Asserted 36 | 03/06/2024 | 06:43:27 | System Event #0x4a | Undetermined system hardware failure | Asserted 37 | 03/06/2024 | 06:43:44 | System Event #0x4a | Undetermined system hardware failure | Asserted 38 | 03/06/2024 | 06:44:30 | Unknown #0x49 | State Deasserted | Asserted 39 | 03/06/2024 | 06:44:30 | Unknown #0x49 | State Deasserted | Asserted 3a | 03/06/2024 | 06:44:31 | Unknown #0x49 | State Deasserted | Asserted 3b | 03/06/2024 | 06:44:31 | Unknown #0x49 | State Deasserted | Asserted 3c | 03/06/2024 | 06:44:31 | Unknown #0x49 | State Deasserted | Asserted 3d | 03/06/2024 | 06:44:31 | Unknown #0x49 | State Deasserted | Asserted 3e | 03/06/2024 | 06:44:32 | Unknown #0x49 | State Deasserted | Asserted 3f | 03/06/2024 | 06:44:32 | Unknown #0x49 | State Deasserted | Asserted 40 | 03/06/2024 | 06:44:32 | Unknown #0x49 | State Deasserted | Asserted 41 | 03/06/2024 | 06:44:32 | Unknown #0x49 | State Deasserted | Asserted 42 | 03/06/2024 | 06:44:32 | Unknown #0x49 | State Deasserted | Asserted 43 | 03/06/2024 | 06:44:33 | Unknown #0x49 | State Deasserted | Asserted 44 | 03/06/2024 | 06:46:30 | Unknown #0x49 | State Deasserted | Asserted 45 | 03/06/2024 | 06:51:46 | System Event #0x4a | Undetermined system hardware failure | Asserted 46 | 03/06/2024 | 06:52:03 | System Event #0x4a | Undetermined system hardware failure | Asserted 47 | 03/06/2024 | 06:52:49 | Unknown #0x49 | State Deasserted | Asserted 48 | 03/06/2024 | 06:52:49 | Unknown #0x49 | State Deasserted | Asserted 49 | 03/06/2024 | 06:52:49 | Unknown #0x49 | State Deasserted | Asserted 4a | 03/06/2024 | 06:52:50 | Unknown #0x49 | State Deasserted | Asserted 4b | 03/06/2024 | 06:52:50 | Unknown #0x49 | State Deasserted | Asserted 4c | 03/06/2024 | 06:52:50 | Unknown #0x49 | State Deasserted | Asserted 4d | 03/06/2024 | 06:52:50 | Unknown #0x49 | State Deasserted | Asserted 4e | 03/06/2024 | 06:52:50 | Unknown #0x49 | State Deasserted | Asserted 4f | 03/06/2024 | 06:52:51 | Unknown #0x49 | State Deasserted | Asserted 50 | 03/06/2024 | 06:52:51 | Unknown #0x49 | State Deasserted | Asserted 51 | 03/06/2024 | 06:52:51 | Unknown #0x49 | State Deasserted | Asserted 52 | 03/06/2024 | 06:52:51 | Unknown #0x49 | State Deasserted | Asserted 53 | 03/06/2024 | 06:54:49 | Unknown #0x49 | State Deasserted | Asserted 54 | 03/06/2024 | 06:55:11 | System Event #0x4a | Undetermined system hardware failure | Asserted 55 | 03/06/2024 | 06:55:34 | System Event #0x4a | Undetermined system hardware failure | Asserted 56 | 03/06/2024 | 06:56:20 | Unknown #0x49 | State Deasserted | Asserted 57 | 03/06/2024 | 06:56:21 | Unknown #0x49 | State Deasserted | Asserted 58 | 03/06/2024 | 06:56:21 | Unknown #0x49 | State Deasserted | Asserted 59 | 03/06/2024 | 06:56:21 | Unknown #0x49 | State Deasserted | Asserted 5a | 03/06/2024 | 06:56:21 | Unknown #0x49 | State Deasserted | Asserted 5b | 03/06/2024 | 06:56:22 | Unknown #0x49 | State Deasserted | Asserted 5c | 03/06/2024 | 06:56:22 | Unknown #0x49 | State Deasserted | Asserted 5d | 03/06/2024 | 06:56:22 | Unknown #0x49 | State Deasserted | Asserted 5e | 03/06/2024 | 06:56:22 | Unknown #0x49 | State Deasserted | Asserted 5f | 03/06/2024 | 06:56:22 | Unknown #0x49 | State Deasserted | Asserted 60 | 03/06/2024 | 06:56:23 | Unknown #0x49 | State Deasserted | Asserted 61 | 03/06/2024 | 06:56:23 | Unknown #0x49 | State Deasserted | Asserted 62 | 03/06/2024 | 06:58:21 | Unknown #0x49 | State Deasserted | Asserted 63 | 03/06/2024 | 07:17:29 | System Event #0x4a | Undetermined system hardware failure | Asserted 64 | 03/06/2024 | 07:17:47 | System Event #0x4a | Undetermined system hardware failure | Asserted 65 | 03/06/2024 | 07:18:33 | Unknown #0x49 | State Deasserted | Asserted 66 | 03/06/2024 | 07:18:33 | Unknown #0x49 | State Deasserted | Asserted 67 | 03/06/2024 | 07:18:33 | Unknown #0x49 | State Deasserted | Asserted 68 | 03/06/2024 | 07:18:34 | Unknown #0x49 | State Deasserted | Asserted 69 | 03/06/2024 | 07:18:34 | Unknown #0x49 | State Deasserted | Asserted 6a | 03/06/2024 | 07:18:34 | Unknown #0x49 | State Deasserted | Asserted 6b | 03/06/2024 | 07:18:35 | Unknown #0x49 | State Deasserted | Asserted 6c | 03/06/2024 | 07:18:35 | Unknown #0x49 | State Deasserted | Asserted 6d | 03/06/2024 | 07:18:35 | Unknown #0x49 | State Deasserted | Asserted 6e | 03/06/2024 | 07:18:35 | Unknown #0x49 | State Deasserted | Asserted 6f | 03/06/2024 | 07:18:36 | Unknown #0x49 | State Deasserted | Asserted 70 | 03/06/2024 | 07:18:36 | Unknown #0x49 | State Deasserted | Asserted bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel clear 2>&1 Password: Clearing SEL. Please allow a few seconds to erase.

bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel list 2>&1 Password: 1 | 03/06/2024 | 07:20:18 | System Event #0x4a | Undetermined system hardware failure | Asserted 2 | 03/06/2024 | 07:20:33 | Unknown #0x49 | State Deasserted | Asserted

For reference below is the journal logs.

root@p10bmc:~# journalctl -b | grep ipmi Mar 06 07:00:28 p10bmc kernel: ipmi-bt-host 1e789140.ibt: Found bt bmc device Mar 06 07:00:28 p10bmc kernel: ipmi-bt-host 1e789140.ibt: Using IRQ 33 Mar 06 07:00:29 p10bmc systemd[1]: /usr/lib/systemd/system/phosphor-ipmi-net@.socket:6: Invalid interface name, ignoring: sys-subsystem-net-devices-%i.device Mar 06 07:00:29 p10bmc systemd[1]: /usr/lib/systemd/system/phosphor-ipmi-net@.socket:6: Invalid interface name, ignoring: sys-subsystem-net-devices-%i.device Mar 06 07:00:29 p10bmc systemd[1]: Created slice Slice /system/phosphor-ipmi-net. Mar 06 07:00:38 p10bmc systemd[1]: Listening on phosphor-ipmi-net@eth0.socket. Mar 06 07:00:38 p10bmc systemd[1]: Listening on phosphor-ipmi-net@eth1.socket. Mar 06 07:00:53 p10bmc ipmid[573]: JSON file not found Mar 06 07:00:54 p10bmc ipmid[573]: Loading whitelist filter Mar 06 07:00:55 p10bmc ipmid[573]: Set restrictedMode = false Mar 06 07:00:56 p10bmc systemd[1]: First Boot Disable IPMI Network was skipped because of an unmet condition check (ConditionPathExists=!/etc/ipmi-net-disable-one-time). Mar 06 07:00:57 p10bmc netipmid[805]: Failed to get bus name, path: /org/openbmc/control/chassis0, error: Input/output error Mar 06 07:00:57 p10bmc netipmid[805]: Bind to interface: eth0 Mar 06 07:00:57 p10bmc ipmid[573]: New interface mapping Mar 06 07:00:57 p10bmc netipmid[807]: Failed to get bus name, path: /org/openbmc/control/chassis0, error: Input/output error Mar 06 07:00:57 p10bmc netipmid[807]: Bind to interface: eth1 Mar 06 07:00:57 p10bmc ipmid[573]: New interface mapping Mar 06 07:01:10 p10bmc ipmid[573]: Host control timeout hit! Mar 06 07:01:10 p10bmc ipmid[573]: Failed to deliver host command Mar 06 07:02:46 p10bmc useradd[2517]: new user: name=ipmi_admin, UID=1002, GID=100, home=/home/ipmi_admin, shell=/sbin/nologin, from=none Mar 06 07:02:46 p10bmc useradd[2517]: add 'ipmi_admin' to group 'priv-admin' Mar 06 07:02:46 p10bmc useradd[2517]: add 'ipmi_admin' to group 'web' Mar 06 07:02:46 p10bmc useradd[2517]: add 'ipmi_admin' to group 'redfish' Mar 06 07:02:46 p10bmc useradd[2517]: add 'ipmi_admin' to shadow group 'priv-admin' Mar 06 07:02:46 p10bmc useradd[2517]: add 'ipmi_admin' to shadow group 'web' Mar 06 07:02:46 p10bmc useradd[2517]: add 'ipmi_admin' to shadow group 'redfish' Mar 06 07:02:46 p10bmc phosphor-user-manager[2517]: useradd: warning: the home directory /home/ipmi_admin already exists. Mar 06 07:02:46 p10bmc bmcweb[1808]: pam_unix(webserver:chauthtok): password changed for ipmi_admin Mar 06 07:04:08 p10bmc usermod[2569]: delete 'ipmi_admin' from group 'web' Mar 06 07:04:08 p10bmc usermod[2569]: delete 'ipmi_admin' from group 'redfish' Mar 06 07:04:08 p10bmc usermod[2569]: add 'ipmi_admin' to group 'ipmi' Mar 06 07:04:08 p10bmc usermod[2569]: delete 'ipmi_admin' from shadow group 'web' Mar 06 07:04:08 p10bmc usermod[2569]: delete 'ipmi_admin' from shadow group 'redfish' Mar 06 07:04:08 p10bmc usermod[2569]: add 'ipmi_admin' to shadow group 'ipmi' Mar 06 07:05:16 p10bmc ipmid[573]: Failed to fetch service for D-Bus object Mar 06 07:05:16 p10bmc netipmid[805]: Removing idle IPMI LAN session, id: 857582077, handler: 1 Mar 06 07:17:32 p10bmc ipmid[573]: Command in process, no attention Mar 06 07:17:35 p10bmc ipmid[573]: Host control timeout hit! Mar 06 07:17:35 p10bmc ipmid[573]: Failed to deliver host command Mar 06 07:17:35 p10bmc ipmid[573]: Failed to deliver host command Mar 06 07:20:00 p10bmc netipmid[805]: Removing idle IPMI LAN session, id: 866293172, handler: 1 Mar 06 07:20:18 p10bmc netipmid[805]: Removing idle IPMI LAN session, id: 1041173208, handler: 1 Mar 06 07:21:10 p10bmc netipmid[805]: Removing idle IPMI LAN session, id: 914219419, handler: 1 Mar 06 07:21:15 p10bmc netipmid[805]: Removing idle IPMI LAN session, id: 908241199, handler: 1 Mar 06 07:23:37 p10bmc netipmid[805]: Removing idle IPMI LAN session, id: 171392366, handler: 1

lxwinspur commented 4 months ago

bash-4.2$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain57bmc.aus.stglabs.ibm.com sel list 2>&1 Password: 1 | 03/06/2024 | 07:20:18 | System Event #0x4a | Undetermined system hardware failure | Asserted 2 | 03/06/2024 | 07:20:33 | Unknown #0x49 | State Deasserted | Asserted

I think this is correct. After sel clear is executed, all sel data has been deleted, and these two pieces of data are newly generated (please pay attention to the generation time)

yadlapati commented 3 months ago

@@lxwinspur I think that is correct. Those 2 are the newer SELs

lxwinspur commented 3 months ago

so can we close this issue?

mzipse commented 2 months ago

@lxwinspur , our test team is still concerned with this and quite honestly, it seems odd to me as well. Why would we surface an error when deleting the error log? "Undetermine system hardware failure" doesn't seem like the correct message to our customers. I also got this feedback from our test team.....

"I attempted clear all SEL and then generate an unrecoverable error log in eBMC. Then, I tried to look up details in the IPMI SEL(See below). Here, I noticed that the SEL entry for both error logs is the same. This could confuse the customer. Therefore, we should correct the description of SEL logs that are logged for clearing SEL.

[rahulmah@gfwa122:~]$ ipmitool -I lanplus -C 17 -N 3 -p 623 -U ipmi_admin -H rain204bmc.in.ibm.com sel list Password: 1 | 09/26/2023 | 07:39:25 | System Event #0x4a | Undetermined system hardware failure | Asserted. <--- SEL for clearing all the logs 2 | 09/26/2023 | 10:44:37 | System Event #0x4a | Undetermined system hardware failure | Asserted. <--- SEL for unrecoverable error logs "

Shouldn't there be some differentiation between the SEL for deleting the error log vs a SEL for a real hardware failure?

I realize this is lower priority for the IPS team right now. As time permits, please investigate and let me know what you think.

lxwinspur commented 2 months ago

sel clear will only clear the previously generated sel logs. It does not care about the log type and log content, so the two newly generated logs are not what sel clear is concerned about. We should see under what circumstances these two logs are generated, but I think they should be two issues right?