ibm-openbmc / dev

Product Development Project Mgmt and Tracking
16 stars 2 forks source link

Rainier:DateTime Manual setting error out #3588

Open gtmills opened 2 years ago

gtmills commented 2 years ago

Downstream defect is https://w3.rchland.ibm.com/projects/SLIC/bestquest/index.php?defect=SW552970

Reported by @gkeishin Test:

  1. Load the latest driver 5.12 1020.2224.20220613a
  2. Try to set to MANUAL mode DateTime

Expectation: Sets to manual and DateTime is updated

Observation: Errors out while trying to set

Error saving date and time settings

1st recreate

Jun 14 16:34:12 rainjmt04bmc systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Jun 14 16:34:18 rainjmt04bmc phosphor-time-manager[584]: Error in setting system time
Jun 14 16:34:18 rainjmt04bmc bmcweb[934]: (2022-06-14 16:34:18) [CRITICAL "error_messages.cpp":233] Internal Error ../git/redfish-core/include/../lib/managers.hpp(1963:44) `redfish::setDateTime(std::shared_ptr<bmcweb::AsyncResp>, std::string)::<lambda(boost::system::error_code)>`:
Jun 14 16:34:18 rainjmt04bmc phosphor-time-manager[584]: The operation failed

2nd recreate


Jun 14 16:26:52 rainjmt04bmc systemd-timedated[2137]: Set NTP to enabled (systemd-timesyncd.service).
Jun 14 16:26:52 rainjmt04bmc phosphor-time-manager[584]: Updated NTP setting
Jun 14 16:26:52 rainjmt04bmc systemd[1]: Started SSH Per-Connection Server (9.3.62.100:49820).
Jun 14 16:26:52 rainjmt04bmc dropbear[2154]: Child connection from ::ffff:9.3.62.100:49820
Jun 14 16:26:52 rainjmt04bmc bmcweb[934]: (2022-06-14 16:26:52) [ERROR "redfish_util.hpp":162] generic:3
Jun 14 16:26:52 rainjmt04bmc systemd[1]: Starting Network Time Synchronization...
Jun 14 16:26:53 rainjmt04bmc systemd[1]: Started Network Time Synchronization.
Jun 14 16:26:55 rainjmt04bmc dropbear[2154]: pam_ibmacf(dropbear:auth): Failed to open file /srv/ibm-acf/ibmacf-prod-backup2.key
Jun 14 16:26:55 rainjmt04bmc dropbear[2154]: PAM password auth succeeded for 'service' from ::ffff:9.3.62.100:49820
Jun 14 16:26:55 rainjmt04bmc sudo[2163]:  service : TTY=pts/0 ; PWD=/home/root ; USER=root ; COMMAND=/bin/sh
Jun 14 16:27:07 rainjmt04bmc avahi-daemon[462]: Registering new address record for 2001:501:ffff:101:a94:efff:fe82:3e17 on eth1.*.
Jun 14 16:27:23 rainjmt04bmc systemd[1]: systemd-timedated.service: Deactivated successfully.
Jun 14 16:27:35 rainjmt04bmc bmcweb[934]: (2022-06-14 16:27:35) [ERROR "websocket.hpp":216] doRead error boost.beast:1
Jun 14 16:27:55 rainjmt04bmc phosphor-time-manager[584]: Time mode is changed
Jun 14 16:27:55 rainjmt04bmc bmcweb[934]: (2022-06-14 16:27:55) [CRITICAL "http_connection.hpp":491] 0x15908b8 Response content provided but code was no-content
Jun 14 16:27:55 rainjmt04bmc systemd[1]: Starting Time & Date Service...
Jun 14 16:27:56 rainjmt04bmc systemd[1]: Started Time & Date Service.
Jun 14 16:27:56 rainjmt04bmc systemd-timedated[2174]: systemd-timesyncd.service: Disabling unit.
Jun 14 16:27:56 rainjmt04bmc systemd[1]: Reloading.
Jun 14 16:28:03 rainjmt04bmc systemd[1]: Stopping Network Time Synchronization...
Jun 14 16:28:03 rainjmt04bmc systemd-timedated[2174]: Set NTP to disabled.
Jun 14 16:28:03 rainjmt04bmc phosphor-time-manager[584]: Updated NTP setting
Jun 14 16:28:16 rainjmt04bmc phosphor-time-manager[584]: Error in setting system time
Jun 14 16:28:16 rainjmt04bmc phosphor-time-manager[584]: The operation failed <---- here is the error
Jun 14 16:28:16 rainjmt04bmc bmcweb[934]: (2022-06-14 16:28:16) [CRITICAL "error_messages.cpp":233] Internal Error ../git/redfish-core/include/../lib/managers.hpp(1963:44) `redfish::setDateTime(std::shared_ptr<bmcweb::AsyncResp>, std::string)::<lambda(boost::system::error_code)>`:
Jun 14 16:28:18 rainjmt04bmc systemd[1]: systemd-timesyncd.service: Deactivated successfully.
Jun 14 16:28:18 rainjmt04bmc systemd[1]: Stopped Network Time Synchronization.
Jun 14 16:28:48 rainjmt04bmc systemd[1]: systemd-timedated.service: Deactivated successfully.
Jun 14 16:28:53 rainjmt04bmc systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Jun 14 16:29:05 rainjmt04bmc systemd[1]: Starting Hostname Service...
Jun 14 16:29:05 rainjmt04bmc systemd[1]: Started Hostname Service.
Jun 14 16:29:45 rainjmt04bmc peltool[2195]: peltool deleting all event logs

3rd recreate

Jun 14 16:31:26 rainjmt04bmc phosphor-time-manager[584]: Error in setting system time
Jun 14 16:31:26 rainjmt04bmc phosphor-time-manager[584]: The operation failed

Jun 14 16:31:26 rainjmt04bmc bmcweb[934]: (2022-06-14 16:31:26) [CRITICAL "error_messages.cpp":233] Internal Error ../git/redfish-core/include/../lib/managers.hpp(1963:44) `redfish::setDateTime(std::shared_ptr<bmcweb::AsyncResp>, std::string)::<lambda(boost::system::error_code)>`:

The set on D-Bus failed https://github.com/ibm-openbmc/bmcweb/blob/1020/redfish-core/lib/managers.hpp#L1963

@lxwinspur Could this be something your team looks at ? Could you set the datetime back and forth from manual to ntp and back ?

lxwinspur commented 2 years ago

https://w3.rchland.ibm.com/projects/SLIC/bestquest/index.php?defect=SW552970

@gtmills we cant access this URL and I tested it with the remote rainier system(10.250.3.20), it works fine

Also, could you give us an NTP service IP address? and lemme test again?

lxwinspur commented 2 years ago

@mzipse FYI

mzipse commented 2 years ago

@lxwinspur , all the information we have in the link that you can't access has already been placed in this github issue. Unfortunately, that's all the information we have to work with.

mzipse commented 2 years ago

There was one thought that perhaps the date/time was set too quickly? ie. did the user try to set it before the BMC was really fully operational. That might be worth experimenting with.

mzipse commented 2 years ago

@lxwinspur , another thing to try to recreate this is to have the GUI up and running. Then have the BMC GUI do a BMC reboot. It takes approximately a minute for the BMC to come up. So it's possible to then set the date/time as the bmc is coming up. That may be where this issue is happening at.

lxwinspur commented 2 years ago

@gtmills @mzipse

When the BMC is not fully started and we manually set the time through the GUI, it is very likely that the setting will fail: There are two situations here:

  1. The time set is incorrect, systemd's timedate1 will return a failure

    ex = sd_bus_call noreply: org.freedesktop.DBus.Error.InvalidArgs: Failed to set local time: Invalid argument

    https://github.com/systemd/systemd/blob/main/src/timedate/timedated.c#L899

  2. Because P-T-M will initialize and update NTP at startup (https://github.com/openbmc/phosphor-time-manager/blob/master/manager.cpp#L75-L94), then the SetNTP method of timedate1 will be called, because this method will not be executed immediately, so when we manually set the time, the SetTime method of timedate1 will be executed. At this time, there will be a judgment in the source code of systemd: when setNTP is not executed, it will return an error:

    ex = sd_bus_call noreply: org.freedesktop.timedate1.AutomaticTimeSyncEnabled: Previous request is not finished, refusing.

    https://github.com/systemd/systemd/blob/main/src/timedate/timedated.c#L839

So, I don't think this is a problem right?

gtmills commented 2 years ago

When the BMC is not fully started

The BMC was fully started.

Could you set the datetime back and forth from manual to ntp and back ?

Did you do this and weren't able to hit?

gtmills commented 2 years ago

@gkeishin Anything we should add here?

gtmills commented 2 years ago

@lxwinspur Could you try toggling back and forth on the GUI on the yellow zone system ?

lxwinspur commented 2 years ago

@gtmills @mzipse Regarding phosphor-time-manager, I pushed a new patch to print exception logs, Could you test it once after this patch is merged? Let's see what the specific error log is? https://gerrit.openbmc.org/c/openbmc/phosphor-time-manager/+/55053/5/epoch_base.cpp#51

lxwinspur commented 2 years ago

Could you try toggling back and forth on the GUI on the yellow zone system ?

Sorry, Maybe I am not following what is you mean?

lxwinspur commented 2 years ago

@gkeishin @gtmills Regarding phosphor-time-manager, I pushed some patches upstream, which are now merged, mostly adding exception printing https://github.com/openbmc/phosphor-time-manager/blob/master/epoch_base.cpp#L51

so, Could you pull the latest code and test it again?

mzipse commented 2 years ago

@lxwinspur , I took your patches and was not able to recreate the issue. Was there any fixes in your patches? Or was it just collecting additional error data in case we do see a failure?

In either case, I think we should go ahead and submit a Pull Request to merge these changes, unless you know of a reason why we shouldn't do that.

lxwinspur commented 2 years ago

@lxwinspur , I took your patches and was not able to recreate the issue. Was there any fixes in your patches? Or was it just collecting additional error data in case we do see a failure?

@mzipse, I just collected additional error data, and didnt do any fixes

In either case, I think we should go ahead and submit a Pull Request to merge these changes, unless you know of a reason why we shouldn't do that.

I didnt see that IBM created the phosphor-time-manager repo on any branch, so I just update SRCREV in openbmc/meta-phosphor/recipes-phosphor/datetime/phosphor-time-manager_git.bb

Also, I see that IBM has two rainier related branches (1020 and 1020.10), which branch should I submit a PR to?

lxwinspur commented 2 years ago

@mzipse @gtmills Sent a PR to 1020.10 branch, https://github.com/ibm-openbmc/openbmc/pull/232