openbmc / phosphor-time-manager

Local time policy and emulated host RTC manager
Apache License 2.0
6 stars 9 forks source link

systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory #5

Open gkeishin opened 4 years ago

gkeishin commented 4 years ago

Notice on CI run https://gerrit.openbmc-project.xyz/#/c/openbmc/meta-phosphor/+/26016/

Oct 11 00:04:44 witherspoon-Y230UF71K03T systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/boot/one_time...
Oct 11 00:04:46 witherspoon-Y230UF71K03T systemd[1]: Starting Wait for /xyz/openbmc_project/time/owner...
Oct 11 00:04:47 witherspoon-Y230UF71K03T systemd[1]: Starting Wait for /xyz/openbmc_project/time/sync_method...
Oct 11 00:05:00 witherspoon-Y230UF71K03T systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/boot/one_time.
Oct 11 00:05:01 witherspoon-Y230UF71K03T systemd-udevd[71]: hwmon9: Worker [88] processing SEQNUM=820 is taking a long time
Oct 11 00:05:01 witherspoon-Y230UF71K03T systemd[1]: Started Wait for /xyz/openbmc_project/time/owner.
Oct 11 00:05:02 witherspoon-Y230UF71K03T systemd[1]: Started Wait for /xyz/openbmc_project/time/sync_method.
Oct 11 00:05:30 witherspoon-Y230UF71K03T systemd[468]: systemd-timedated.service: ProtectHostname=yes is configured, but the kernel does not support UTS namespaces, ignoring namespace setup.
Oct 11 00:05:30 witherspoon-Y230UF71K03T systemd-timedated[468]: Failed to get target of /etc/localtime: No such file or directory
Oct 11 00:05:31 witherspoon-Y230UF71K03T systemd-timedated[468]: Set NTP to enabled (systemd-timesyncd.service).
Oct 11 00:05:31 witherspoon-Y230UF71K03T phosphor-timemanager[463]: Updated NTP setting
Oct 11 00:05:31 witherspoon-Y230UF71K03T nslcd[184]: [b0dc51] <group/member="systemd-timesync"> failed to bind to LDAP server ldap://127.0.0.1/: Can't contact LDAP server: Transport endpoint is not connected
Oct 11 00:05:31 witherspoon-Y230UF71K03T nslcd[184]: [b0dc51] <group/member="systemd-timesync"> no available LDAP server found: Can't contact LDAP server: Transport endpoint is not connected
Oct 11 00:05:31 witherspoon-Y230UF71K03T nslcd[184]: [b0dc51] <group/member="systemd-timesync"> no available LDAP server found: Server is unavailable: Transport endpoint is not connected
Oct 11 00:05:31 witherspoon-Y230UF71K03T systemd[470]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 11 00:05:31 witherspoon-Y230UF71K03T systemd[470]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 11 00:05:31 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 11 00:05:31 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 1.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[476]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[476]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 2.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Start request repeated too quickly.
Oct 11 00:05:32 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:06:02 witherspoon-Y230UF71K03T systemd[1]: systemd-timedated.service: Succeeded.
Oct 11 00:12:16 witherspoon-Y230UF71K03T systemd[1]: logrotate.timer: Succeeded.
Oct 11 00:12:16 witherspoon-Y230UF71K03T systemd[1]: systemd-tmpfiles-clean.timer: Succeeded.
Oct 11 00:12:25 witherspoon-Y230UF71K03T systemd[1]: mapper-wait@-xyz-openbmc_project-time-owner.service: Succeeded.
Oct 11 00:12:25 witherspoon-Y230UF71K03T systemd[1]: Stopped Wait for /xyz/openbmc_project/time/owner.
Oct 11 00:12:25 witherspoon-Y230UF71K03T systemd[1]: mapper-wait@-xyz-openbmc_project-time-sync_method.service: Succeeded.
Oct 11 00:12:25 witherspoon-Y230UF71K03T systemd[1]: Stopped Wait for /xyz/openbmc_project/time/sync_method.
Oct 11 00:12:26 witherspoon-Y230UF71K03T systemd[1]: Stopped Clear one time boot overrides.
Oct 11 00:12:26 witherspoon-Y230UF71K03T systemd[1]: mapper-wait@-xyz-openbmc_project-control-host0-boot-one_time.service: Succeeded.
Oct 11 00:12:26 witherspoon-Y230UF71K03T systemd[1]: Stopped Wait for /xyz/openbmc_project/control/host0/boot/one_time.
Oct 11 00:13:06 witherspoon-Y230UF71K03T kernel: Switching to timer-based delay loop, resolution 40ns
Oct 11 00:13:06 witherspoon-Y230UF71K03T kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 49.50 BogoMIPS (lpj=247500)
Oct 11 00:13:06 witherspoon-Y230UF71K03T kernel: workingset: timestamp_bits=30 max_order=17 bucket_order=0
Oct 11 00:13:06 witherspoon-Y230UF71K03T kernel: timeriomem_rng 1e6e2078.hwrng: 32bits from 0x(ptrval) @ 1us
Oct 11 00:13:06 witherspoon-Y230UF71K03T kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Oct 11 00:13:06 witherspoon-Y230UF71K03T kernel: [drm] No driver support for vblank timestamp query.
Oct 11 00:13:07 witherspoon-Y230UF71K03T systemd-journald[80]: Runtime Journal (/run/log/journal/ca02981f2d6f4d1c9b65ecb70993e466) is 8.0M, max 64.0M, 56.0M free.
Oct 11 00:13:17 witherspoon-Y230UF71K03T systemd[127]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 11 00:13:17 witherspoon-Y230UF71K03T systemd[127]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 11 00:13:18 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 11 00:13:18 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:13:18 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 11 00:13:18 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 1.
Oct 11 00:13:18 witherspoon-Y230UF71K03T systemd[130]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[130]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 2.
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Start request repeated too quickly.
Oct 11 00:13:19 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:13:24 witherspoon-Y230UF71K03T systemd[1]: Starting Clear one time boot overrides...
Oct 11 00:13:48 witherspoon-Y230UF71K03T systemd[1]: Started Clear one time boot overrides.
Oct 11 00:13:53 witherspoon-Y230UF71K03T systemd[1]: Starting Wait for /xyz/openbmc_project/control/host0/boot/one_time...
Oct 11 00:13:55 witherspoon-Y230UF71K03T systemd[1]: Starting Wait for /xyz/openbmc_project/time/owner...
Oct 11 00:13:55 witherspoon-Y230UF71K03T systemd[1]: Starting Wait for /xyz/openbmc_project/time/sync_method...
Oct 11 00:14:10 witherspoon-Y230UF71K03T systemd-udevd[72]: hwmon9: Worker [83] processing SEQNUM=820 is taking a long time
Oct 11 00:14:10 witherspoon-Y230UF71K03T systemd[1]: Started Wait for /xyz/openbmc_project/control/host0/boot/one_time.
Oct 11 00:14:12 witherspoon-Y230UF71K03T systemd[1]: Started Wait for /xyz/openbmc_project/time/owner.
Oct 11 00:14:12 witherspoon-Y230UF71K03T systemd[1]: Started Wait for /xyz/openbmc_project/time/sync_method.
Oct 11 00:14:44 witherspoon-Y230UF71K03T systemd[476]: systemd-timedated.service: ProtectHostname=yes is configured, but the kernel does not support UTS namespaces, ignoring namespace setup.
Oct 11 00:14:44 witherspoon-Y230UF71K03T systemd-timedated[476]: Failed to get target of /etc/localtime: No such file or directory
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd-timedated[476]: Set NTP to enabled (systemd-timesyncd.service).
Oct 11 00:14:45 witherspoon-Y230UF71K03T phosphor-timemanager[472]: Updated NTP setting
Oct 11 00:14:45 witherspoon-Y230UF71K03T nslcd[185]: [b0dc51] <group/member="systemd-timesync"> failed to bind to LDAP server ldap://127.0.0.1/: Can't contact LDAP server: Transport endpoint is not connected
Oct 11 00:14:45 witherspoon-Y230UF71K03T nslcd[185]: [b0dc51] <group/member="systemd-timesync"> no available LDAP server found: Can't contact LDAP server: Transport endpoint is not connected
Oct 11 00:14:45 witherspoon-Y230UF71K03T nslcd[185]: [b0dc51] <group/member="systemd-timesync"> no available LDAP server found: Server is unavailable: Transport endpoint is not connected
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd[479]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd[479]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 11 00:14:45 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 1.
Oct 11 00:14:46 witherspoon-Y230UF71K03T systemd[494]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 11 00:14:46 witherspoon-Y230UF71K03T systemd[494]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 11 00:14:46 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 11 00:14:47 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:14:47 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 11 00:14:47 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 2.
Oct 11 00:14:47 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Start request repeated too quickly.
Oct 11 00:14:47 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 11 00:15:15 witherspoon-Y230UF71K03T systemd[1]: systemd-timedated.service: Succeeded.
Oct 11 00:16:00 witherspoon-Y230UF71K03T systemd[698]: systemd-timedated.service: ProtectHostname=yes is configured, but the kernel does not support UTS namespaces, ignoring namespace setup.
Oct 11 00:16:00 witherspoon-Y230UF71K03T systemd-timedated[698]: Failed to get target of /etc/localtime: No such file or directory
mine260309 commented 4 years ago

The log

Failed to set up special execution directory in /var/lib: Not a directory
Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory

indicates that the filesystem is corrupted, or at least /var/lib is not there, maybe check if there is any HW issue?

gkeishin commented 4 years ago

@geissonator ^^^

geissonator commented 4 years ago

I saw it when I was doing upgrade/downgrade testing. Let me see if I can get more info.

geissonator commented 4 years ago

HW CI is hitting this intermittently. I don't see any indication in the logs of other applications having issues, but the time application does run the earliest:


Oct 13 23:12:05 witherspoon-Y230UF71K03T systemd[1]: Starting Flush Journal to Persistent Storage...
Oct 13 23:12:06 witherspoon-Y230UF71K03T systemd-journald[80]: Time spent on flushing to /var is 875.947ms for 209 entries.
Oct 13 23:12:06 witherspoon-Y230UF71K03T systemd-journald[80]: System Journal (/var/log/journal/f5a255d7642740388dccdc9dfebd0c5a) is 2.0M, max 2.5M, 496.0K free.
Oct 13 23:12:06 witherspoon-Y230UF71K03T systemd-networkd[84]: Enumeration completed
Oct 13 23:12:06 witherspoon-Y230UF71K03T systemd[1]: Started Network Service.
Oct 13 23:12:07 witherspoon-Y230UF71K03T systemd-udevd[83]: Using default interface naming scheme 'v243'.
Oct 13 23:12:07 witherspoon-Y230UF71K03T systemd-udevd[83]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct 13 23:12:07 witherspoon-Y230UF71K03T systemd[1]: Started Flush Journal to Persistent Storage.
Oct 13 23:12:07 witherspoon-Y230UF71K03T systemd[1]: Started udev Coldplug all Devices.
Oct 13 23:12:07 witherspoon-Y230UF71K03T systemd-networkd[84]: eth0: IPv6 successfully enabled
Oct 13 23:12:07 witherspoon-Y230UF71K03T kernel: 8021q: adding VLAN 0 to HW filter on device eth0
Oct 13 23:12:07 witherspoon-Y230UF71K03T kernel: ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
Oct 13 23:12:08 witherspoon-Y230UF71K03T systemd-networkd[84]: eth0: Gained carrier
Oct 13 23:12:09 witherspoon-Y230UF71K03T systemd-networkd[84]: eth0: Gained IPv6LL
Oct 13 23:12:11 witherspoon-Y230UF71K03T systemd[1]: Found device /dev/ttyVUART0.
Oct 13 23:12:12 witherspoon-Y230UF71K03T systemd[1]: Found device /dev/aspeed-lpc-ctrl.
Oct 13 23:12:13 witherspoon-Y230UF71K03T systemd[1]: Created slice system-xyz.openbmc_project.Hwmon.slice.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Created slice system-xyz.openbmc_project.led.controller.slice.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in POSIX Message Queue File System being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Create list of static device nodes for the current kernel being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd-networkd[84]: eth0: Configured
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Load Kernel Modules being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Create System Users being skipped.
Oct 13 23:12:14 witherspoon-Y230UF71K03T systemd[1]: Starting Create Volatile Files and Directories...
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[1]: Started Create Volatile Files and Directories.
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[1]: Starting Network Name Resolution...
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[127]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[127]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /lib/systemd/systemd-timesyncd: Not a directory
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[1]: Starting Network Time Synchronization...
Oct 13 23:12:15 witherspoon-Y230UF71K03T systemd[1]: Condition check resulted in Update is Completed being skipped.
Oct 13 23:12:16 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Main process exited, code=exited, status=238/STATE_DIRECTORY
Oct 13 23:12:16 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Failed with result 'exit-code'.
Oct 13 23:12:16 witherspoon-Y230UF71K03T systemd[1]: Failed to start Network Time Synchronization.
Oct 13 23:12:16 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Oct 13 23:12:16 witherspoon-Y230UF71K03T systemd[1]: systemd-timesyncd.service: Scheduled restart job, restart counter is at 1.```
geissonator commented 4 years ago

My CI system was still in fail state, I restarted the systemd-timesyncd servier after verifying /var/lib/ was there and still hit this issue so something fishy going on here.

mine260309 commented 4 years ago

The same issue could be found at:

And all indicate that it's related to permission issue with /var/lib/systemd/timesync

On existing witherspoon that has the issue, the directories look as below

# ls -al /var/lib/systemd/timesync
lrwxrwxrwx    1 root     root            27 Oct 11 20:08 /var/lib/systemd/timesync -> ../private/systemd/timesync

# ls -al /var/lib/private/
drwx------    3 root     root           224 Oct 11 20:08 .
drwxr-xr-x   18 root     root          1448 Oct 14 00:59 ..
drwxr-xr-x    3 root     root           232 Oct 11 20:08 systemd

# ls -al /var/lib/private/systemd/timesync/
drwxr-xr-x    2 systemd- systemd-       224 Oct 11 19:45 .
drwxr-xr-x    3 root     root           232 Oct 11 20:08 ..
-rw-r--r--    1 systemd- systemd-         0 Oct 11 20:56 clock

We can see that the /var/lib/systemd/timesync is a symbol link to /var/lib/private/systemd/timesync, and it causes the issue.

After removing such directories, the service could be started successfully, and the directory becomes

# ls -al /var/lib/systemd/timesync/
drwxr-xr-x    2 systemd- systemd-       224 Oct 14 03:12 .
drwxr-xr-x    6 root     root           504 Oct 14 03:12 ..
-rw-r--r--    1 systemd- systemd-         0 Oct 14 03:15 clock

So in OpenBMC, the solution could be to do a factory reset. Or if we need to handle it well, it's needed to find out which OpenBMC releases (with older systemd) uses the symbol link for /var/lib/systemd/timesync, and add specific scripts to remove it during code update.

geissonator commented 4 years ago

Looks like we'd want something like this service to do it - https://github.com/ricardosalveti/meta-lmp/commit/74513770589475f83173dde62a3ffb22cb73f8e0

geissonator commented 4 years ago

How to fix if you hit this:

/bin/rm -fv /var/lib/systemd/timesync && /bin/mv /var/lib/private/systemd/timesync /var/lib/systemd/timesync