Closed phillxnet closed 2 months ago
Linking to our website repo PR, now merged re pending updates to our downloadable installers:
To examine the idea that we have a time-out error here re password generation we have the following on real Pi4 hardware:
Pi4:~ # time /usr/bin/pass generate --no-symbols --force python-keyring/rockstor/SECRET_KEY 100
The generated password for python-keyring/rockstor/SECRET_KEY is:
XQV4EBg8TlyrRxgagVfScKGqKIqVqcq04i5GsAcr71xGkWjtgopy6pnd1BYqsc2SbXw95ufaERqjw48cKvCvYPhWxFBgOGpBw3IM
real 0m0.139s
user 0m0.072s
sys 0m0.074s
Pi4:~ # time /usr/bin/pass generate --no-symbols --force python-keyring/rockstor/SECRET_KEY 1000
The generated password for python-keyring/rockstor/SECRET_KEY is:
Ks5Gh9YOdnt3LEzqJVb5ieGkrz2GdJTtMz1oKbx2a7v0xDhqsvRfDApyhermkZ5z3n0KlKvoER7ErTiWwdC5SkFTMgF80R1os5vO6rdordIbF3XXiQtYrVwUogCIwGYQyiggFCOq9nUML3l269ad467N0zNbrqAJFmIm3aLf9xaztqNOSzSszQLu1ov7JRhxwINCdWPQ69mlqcnk1JUdNfi3MbpvZjVjMJxU668xtePBPJdMdkBw0B6k58GY69cW1hkhn0kXrdkrTOKVuPH2GahwEDJbqS8oEwLhDJjmrRrngJMQtIeH2PU11CaCjs1zsDYq9HGb1tSy8yBlDyY7uflom328jWUfooQ74hSTtJgesp5MD6z7zCHkNqWr6HkJYDi7VSGGe7FgWb4Gkj7MtKfbCBTDl36cPLK8Mf9J9MrB6vAGJjeiGMuNVa26gVQWicPOa2zTr7KcpgpNB5DMTE1HEyNTLOXFQL6onF0ivIfM4QanMZ6TF7uB244HzQ98P2Xfb4lGQULEp5OxSRffAFKdHl3gN0Gdq3xVVbBHGPUnngYpbV70JFay2DsFYY12QoBoxN00HuVyFDxLEONZc5UAja4JmKfCqgVgNZMNzklK2UpCoKbfACfHpYM0JWhoUWNNn0yNPoHWd1c3jhP7iS94cu1yxApZeqsJclTGgVOVLUa840NswYkC7LsNcVu4MuxkHUtHjciFCFbd3AvVjf7vPhNqLb6WuiYvbRxh6GxapSzFdeBr1tLcnpzysrhCEboZFvhAYZbUQ9c9XpdVlTpxbhvL0nTac9XYxFXd0ACzBd3RNPfDno81irqm5QfUhWv1GumXC0MsoJdB6vz7yie6UHJ1Rs5CyqY796GoojKEMrrIhqDPuNd2CmiSD7UBeweG1QEyxB0RGTyZzxhI7TJUASI101aVFmj2HWpDNYWfyudv7hTlmJ9rugD9aGjkui8twNM3GRNqkNGmH5kdqrMOxCf1blSSwvc7HTRZ
real 0m0.143s
user 0m0.090s
sys 0m0.064s
so negligibly different with 10 times the password length.
Also half length generation (50 chars):
Pi4:~ # time /usr/bin/pass generate --no-symbols --force python-keyring/rockstor/SECRET_KEY 50
The generated password for python-keyring/rockstor/SECRET_KEY is:
WO28pT61YFMKcdEoJnnaNnAGXZS4oJ94t4lMBAgEhDLEpbcimk
real 0m0.140s
user 0m0.076s
sys 0m0.069s
So this does not seem to be the issue: plus multiple concurrent virtual aarch64 VM instances on the Pi4 (non bare metal) did not show this boot-up failure in rockstor-pre.
Above tests used the published TW installer for Pi4.
But we still have CPU: 827ms
indicated in this service!
Another potential cause here could be Pi4 (bare metal) has not real-time-clock! During boot-up, on a TW rockstor installer resulting instance, we see:
May 16 11:02:42 Pi4 smbd[1000]: [2024/05/16 11:02:42.237706, 0] ../../source3/smbd/server.c:1746(main)
May 16 11:02:42 Pi4 smbd[1000]: smbd version 4.19.5-git.342.57620c4f7e1.1SUSE-oS16.9-aarch64 started.
May 16 11:02:42 Pi4 smbd[1000]: Copyright Andrew Tridgell and the Samba Team 1992-2023
May 16 11:02:42 Pi4 systemd[1]: Started Samba SMB Daemon.
May 16 11:02:42 Pi4 systemd[1]: Reached target Multi-User System.
May 16 11:02:42 Pi4 systemd[1]: Startup finished in 2.762s (kernel) + 7.034s (initrd) + 1min 15.330s (userspace) = 1min 25.127s.
May 16 11:02:42 Pi4 agetty[969]: failed to open credentials directory
May 16 11:02:42 Pi4 agetty[972]: failed to open credentials directory
May 16 11:02:45 Pi4 chronyd[811]: Selected source 91.209.16.78 (2.opensuse.pool.ntp.org)
May 16 11:02:45 Pi4 chronyd[811]: System clock wrong by 5447576.763763 seconds
Jul 18 12:15:44 Pi4 kernel: BTRFS info (device sda3): scrub: started on devid 1
Jul 18 12:15:44 Pi4 kernel: net_ratelimit: 36 callbacks suppressed
Jul 18 12:15:44 Pi4 kernel: brcmfmac: brcmf_set_channel: set chanspec 0x100c fail, reason -52
Jul 18 12:15:44 Pi4 kernel: brcmfmac: brcmf_set_channel: set chanspec 0x100d fail, reason -52
Jul 18 12:15:44 Pi4 kernel: brcmfmac: brcmf_set_channel: set chanspec 0x100e fail, reason -52
Jul 18 12:15:44 Pi4 kernel: brcmfmac: brcmf_set_channel: set chanspec 0xd022 fail, reason -52
Jul 18 12:15:42 Pi4 chronyd[811]: System clock was stepped by 5447576.763763 seconds
Jul 18 12:15:42 Pi4 systemd[1]: Started Discard unused blocks on a mounted filesystem.
I.e. we are jumping from May 16th to today July 18th via chronyd on each reboot. Subsesquent runs of the same rockstor-pre service then run OK, and all aarch64 testing was done on VM's which also have, by host inheritance, working rtc.
Pi4:~ # systemctl status chronyd
● chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; preset: enabled)
Active: active (running) since Thu 2024-05-16 11:02:38 WEST; 2 months 2 days ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Process: 790 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Process: 813 ExecStartPost=/usr/libexec/chrony/helper update-daemon (code=exited, status=0/SUCCESS)
Main PID: 811 (chronyd)
Tasks: 1
CPU: 687ms
CGroup: /system.slice/chronyd.service
└─811 /usr/sbin/chronyd
May 16 11:02:37 Pi4 systemd[1]: Starting NTP client/server...
May 16 11:02:37 Pi4 chronyd[811]: chronyd version 4.5 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +NTS +SECHASH +IPV6 -DEBUG)
May 16 11:02:37 Pi4 chronyd[811]: Frequency 8.326 +/- 8.772 ppm read from /var/lib/chrony/drift
May 16 11:02:38 Pi4 systemd[1]: Started NTP client/server.
May 16 11:02:45 Pi4 chronyd[811]: Selected source 91.209.16.78 (2.opensuse.pool.ntp.org)
May 16 11:02:45 Pi4 chronyd[811]: System clock wrong by 5447576.763763 seconds
Jul 18 12:15:42 Pi4 chronyd[811]: System clock was stepped by 5447576.763763 seconds
Jul 18 12:38:24 Pi4 chronyd[811]: Selected source 194.117.47.42 (2.opensuse.pool.ntp.org)
Pi4:~ # systemctl status chrony-wait.service ○ chrony-wait.service - Wait for chrony to synchronize system clock Loaded: loaded (/usr/lib/systemd/system/chrony-wait.service; disabled; preset: disabled) Active: inactive (dead) Docs: man:chronyc(1)
N.B. likely an akin to NetworkManager-wait-online.service that we use in rockstor-build.service as we need internet acccess to Pypi to build our .venv.
Once booted already so chronyd.service is done with syncing rtc/ntp already.
Pi4:~ # systemctl start chrony-wait.service
Pi4:~ # systemctl status chrony-wait.service
● chrony-wait.service - Wait for chrony to synchronize system clock
Loaded: loaded (/usr/lib/systemd/system/chrony-wait.service; disabled; preset: disabled)
Active: active (exited) since Thu 2024-07-18 13:00:35 WEST; 9s ago
Docs: man:chronyc(1)
Process: 1619 ExecStart=/usr/bin/chronyc -h 127.0.0.1,::1 waitsync 0 0.1 0.0 1 (code=exited, status=0/SUCCESS)
Main PID: 1619 (code=exited, status=0/SUCCESS)
CPU: 134ms
Jul 18 13:00:34 Pi4 systemd[1]: Starting Wait for chrony to synchronize system clock...
Jul 18 13:00:35 Pi4 systemd[1]: Finished Wait for chrony to synchronize system clock.
Adding:
After=chrony-wait.service
&
> Requires=chrony-wait.service
Wants=chrony-wait.service
to rockstor-pre.service.
Fix tested on a Rockstor 15.6 based install from our downloadable installer. Using a bare metal Pi4: --- 10 sequential boot/reboots were accomplished without rockstor* service start issues. --- All upstream updates, as of this comment, applied. Rockstor package version 5.0.9-0, as per current installers.
nano /opt/rockstor/conf/rockstor-pre.service
[Unit]
section then reads:
[Unit]
Description=Tasks required prior to starting Rockstor
After=rockstor-build.service
After=postgresql.service
After=NetworkManager.service
After=chrony-wait.service
Requires=rockstor-build.service
Requires=postgresql.service
Requires=NetworkManager.service
Wants=chrony-wait.service
Apply the change via:
systemctl start rockstor-bootstrap.service
Web-UI then available and can be used to test reboot capability.
The last comment indicates our chosen fix, which makes some sense: however it may pertain more to some-other interdependency. However it makes sense that we ensure NTP updates the clock, especially on non persistent RTC systems such as the Pi4, as we then have better time reporting on our rockstor services:
Pi4:~ # systemctl status rockstor-pre.service
● rockstor-pre.service - Tasks required prior to starting Rockstor
Loaded: loaded (/usr/lib/systemd/system/rockstor-pre.service; enabled; preset: disabled)
Active: active (exited) since Thu 2024-07-18 16:49:15 WEST; 10min ago
Process: 1860 ExecStart=/usr/local/bin/poetry run initrock (code=exited, status=0/SUCCESS)
Main PID: 1860 (code=exited, status=0/SUCCESS)
Tasks: 5 (limit: 4915)
CPU: 1min 12.097s
CGroup: /system.slice/rockstor-pre.service
├─2846 gpg-agent --homedir /root/.gnupg --use-standard-socket --daemon
└─2850 scdaemon --multi-server
I.e. no more references back 1 to 2 months to installer build date:
Active: failed (Result: exit-code) since Mon 2024-04-22 13:00:14 WEST; 1 month 1 day ago
N.B. an edit of the above fix to Wanting rather than Requiring the chrony-wait.service accounted for the below note re one time-out seen on chrony-wait.service.
Further testing indicates that on occasions we see failures in the chrony-wait.service (timeout in the following case):
Pi4:~ # systemctl status chrony-wait.service
× chrony-wait.service - Wait for chrony to synchronize system clock
Loaded: loaded (/usr/lib/systemd/system/chrony-wait.service; disabled; preset: disabled)
Active: failed (Result: timeout) since Mon 2024-06-24 13:03:16 WEST; 3 weeks 3 days ago
Docs: man:chronyc(1)
Process: 855 ExecStart=/usr/bin/chronyc -h 127.0.0.1,::1 waitsync 0 0.1 0.0 1 (code=exited, status=1/FAILURE)
Main PID: 855 (code=exited, status=1/FAILURE)
CPU: 148ms
Jun 24 13:00:15 Pi4 systemd[1]: Starting Wait for chrony to synchronize system clock...
Jun 24 13:03:16 Pi4 systemd[1]: chrony-wait.service: start operation timed out. Terminating.
Jun 24 13:03:16 Pi4 systemd[1]: chrony-wait.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 13:03:16 Pi4 systemd[1]: chrony-wait.service: Failed with result 'timeout'.
Jun 24 13:03:16 Pi4 systemd[1]: Failed to start Wait for chrony to synchronize system clock.
Which in turn fails our services. This may be more an artefact of this service's inadequate timeout, or an intermittent failure otherwise. Looking to alternative approaches.
https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html#time-sync.target In the last comments noted fail, due to chrony-wait.service's timeout this target was still indicated as having been established, even though the chrony-wait.service failed.
Services where accurate time is essential should be ordered after this unit, but not pull it in.
Wants=time-sync.target
This time-sync.target did not seem to work for us! Reverting to now edited prior fix.
Closing as: Fixed by #2879
We have, post initially successful installer boot, and initial setup - a rockstor-pre.service fail on subsequent reboots.
Observed on 15.5, 15.6, and Tumbleweed based Pi4 installers.
A subsequent start of our top service will then succeed in starting all dependencies:
N.B. the above was observed after already removing rockstor-build.service entries associated with NetworkManager-wait-online; as this was though to be related previously.
Testing/debugging assistance is required for this issue, which pertains to current DIY installer builds, and our
pendingcurrent 5.0.9-0 (RC4) downloadable installers.Real-hardware testing is currently compromised for myself.