Open Kronsed0 opened 8 months ago
[ To save everyone else the bother of checking, the uptime and system time are consistent, all pointing to a start time of ~08:35:15 ]
On a Pi4, (64-bit bookworm) I've added temp_limit=30 and I observe throttling:
pi@pi4:~ $ vcgencmd get_throttled
throttled=0x60006
pi@pi4:~ $ vcgencmd measure_clock core arm
frequency(1)=219994624
frequency(48)=360123040
I'm disabling the ntp service, to ensure clock is not being "corrected".
pi@pi4:~ $ systemctl stop systemd-timesyncd
pi@pi4:~ $ systemctl status systemd-timesyncd
○ systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; preset: enabled)
Active: inactive (dead) since Thu 2024-03-07 15:27:02 GMT; 4min 3s ago
Duration: 10min 47.786s
Docs: man:systemd-timesyncd.service(8)
Process: 243 ExecStart=/lib/systemd/systemd-timesyncd (code=exited, status=0/SUCCESS)
Main PID: 243 (code=exited, status=0/SUCCESS)
Status: "Idle."
CPU: 593ms
I'm then timing sleep 60
externally. The time I get is within a second.
Do you see the clock drift when using sleep 60
?
I also tried running date
, waiting 60s (using external stopwatch), then date
again, and again I get the expected times.
I wouldn't expect a Pi4 and CM4 to behave differently. Do you have any other Pi devices you can test on?
FYI I just tested on a CM4 from my drawer, still running a 5.10 kernel, and it showed a 9 second per minute slow down (pressing Return at 1 minute intervals using a stopwatch):
pi@raspberrypi:~$ date
Fri 7 May 16:05:27 BST 2021
pi@raspberrypi:~$ date
Fri 7 May 16:06:18 BST 2021
I've handed the board over to @popcornmix.
I also noticed that this issue actually only occurs when throttling down to 300 MHz (lowest frequency?). I didn't observe any deviation at intermediate stages of 600 MHz or higher. This could possibly explain @popcornmix effect. Here, it was throttled 'only' to 360 MHz.
I can remove temp_limit=30
and replace it with arm_freq=300
and CM4 shows the 9s/min drift.
arm_freq=360
doesn't show the drift.
On my Pi4 there is no drift with arm_freq=300
(this is with systemd-timesyncd.service
stopped).
I do have network connected on Pi4 though (my current image is nfs booted), whereas the CM4 had no network.
CM4 doesn't show the issue with network connected, but does after sudo systemctl stop systemd-timesyncd.service
(so that presumably can correct the clock drift).
Switching to 64-bit kernel doesn't affect CM4. It is somewhat dated (5.10.17-v8+).
Pi4 with fresh bookwork lite 64-bit install is not showing issue with arm_freq=300
and no network.
Running rpi-update on the (32-bit buster) CM4 has not stopped the drift occurring.
before:
Linux raspberrypi 5.10.17-v8+ #1414 SMP PREEMPT Fri Apr 30 13:23:25 BST 2021 aarch64 GNU/Linux
Oct 2 2020 08:44:17
version bfe1c7ddc094b735134e96f2338c0ab2c9b10a5b (release)
timestamp 1601624657
update-time 0
capabilities 0x00000000
Apr 30 2021 13:45:52
Copyright (c) 2012 Broadcom
version d7f29d96450abfc77cd6cf011af1faf1e03e5e56 (clean) (release) (start)
after:
Linux raspberrypi 6.6.20-v8+ #1739 SMP PREEMPT Thu Mar 7 11:46:23 GMT 2024 aarch64 GNU/Linux
Oct 2 2020 08:44:17
version bfe1c7ddc094b735134e96f2338c0ab2c9b10a5b (release)
timestamp 1601624657
update-time 0
capabilities 0x00000000
Feb 29 2024 12:24:53
Copyright (c) 2012 Broadcom
version f4e2138c2adc8f3a92a3a65939e458f11d7298ba (clean) (release) (start)
I've now reflashed emmc with bullseye 64-bit, and I'm not seeing the drift:
Linux cm4 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
Oct 2 2020 08:44:17
version bfe1c7ddc094b735134e96f2338c0ab2c9b10a5b (release)
timestamp 1601624657
update-time 0
capabilities 0x00000000
Oct 17 2023 15:39:16
Copyright (c) 2012 Broadcom
version 30f0c5e4d076da3ab4f341d88e7d505760b93ad7 (clean) (release) (start)
a subsequent rpi-update still is not drifting:
Linux cm4 6.6.20-v8+ #1739 SMP PREEMPT Thu Mar 7 11:46:23 GMT 2024 aarch64 GNU/Linux
Oct 2 2020 08:44:17
version bfe1c7ddc094b735134e96f2338c0ab2c9b10a5b (release)
timestamp 1601624657
update-time 0
capabilities 0x00000000
Feb 29 2024 12:24:53
Copyright (c) 2012 Broadcom
version f4e2138c2adc8f3a92a3a65939e458f11d7298ba (clean) (release) (start)
So it's not looking like bootloader, firmware or kernel related.
I've seen drift on cm4, 32-bit buster. I'll try 32-bit buster on Pi4.
Pi4 with 32-bit buster seems to have accurate time with arm_freq=300 and no network.
Searching suggests that ntpd does store previous drift in a driftfile which is possibly a mechanism where some state may lie within an installation which isn't network connected (e.g. it could be hiding the 15% drift that it has previously seen).
However rpios doesn't use ntpd, but systemd-timesyncd and I've not found documented a similar driftfile.
But even with systemd-timesyncd disabled, then rebooted, I'm still not seeing drift on a Pi4 with arm_freq=300
.
I reflashed CM4 with this image.
I booted with no network, ran sudo systemctl disable systemd-timesyncd
. Added arm_freq=300
to config.txt and rebooted. I get the drift.
I've done exactly the same sequence on Pi4 and I don't get the drift. This is surprising.
The kernel/firmware are the same on CM4/Pi4. The bootloader is newer on Pi4.
On CM4, with 60s wall clock time elapsing we measure:
arm_freq=350 61s
arm_freq=300 52s
arm_freq=250 33s
arm_freq=200 14s
arm_freq=150 4s
(note I'm running sh pi@pi "date"; sleep 60; ssh pi@pi "date";
which has about .5 seconds overhead in the ssh calls, increasing the numbers slightly).
My current thinking is the arm generic timer has a resynchroniser between the timer's clock (Pi oscillator at 54MHz) and the arm core clock.
If the ratio of those two is too small, updates get missed and so the arm "wall clock" slows down.
The behaviour of the Pi4 is confusing though, as it should suffer the same. There are very few differences between a Pi4 and CM4(*).
Possibly somewhere the drift is being detected and compensated for (we know systemd-timesyncd can do this, but possibly there is another route).
(*) CM4 does support PTP hardware timestamping, which Pi4 doesn't but it's hard to imagine that causing this issue - especially in the case of no ethernet cable connected.
I've bumped CM4 to same bootloader as Pi4:
2024/01/22 10:41:21
version 51ed67b03b3dde4e76b345370f312d07aabf45b8 (release)
timestamp 1705920081
update-time 1709918044
capabilities 0x0000007f
but no change (CM4 drifts).
I've got CM4 booting from my normal nfs rootfs. This is bookworm 64-bit (with systemd-timesyncd disabled). Latest rpi-update kernel. Drift still present.
(*) CM4 does support PTP hardware timestamping, which Pi4 doesn't but it's hard to imagine that causing this issue - especially in the case of no ethernet cable connected.
I've nobbled bcm_ptp_probe
to always return NULL which removes one difference in CM4 and Pi4.
But CM4 still drifts.
I've written some assembly as an armstub that reads the generic timer (CNTPCT_EL0
) and outputs a character on the uart each time it crossed 54M. This should be once a second. So, counting characters based on an external 60s timer gives:
CM4
arm_freq: 350 60
arm_freq: 300 52
arm_freq: 250 33
arm_freq: 200 15
arm_freq: 150 5
arm_freq: 100 14
Pi4
arm_freq: 350 60
arm_freq: 300 60
arm_freq: 250 60
arm_freq: 200 52
arm_freq: 150 23
arm_freq: 100 5
So they both suffer from this issue, but my particular Pi4 only starts failing at a lower frequency. I believe this is just silicon related, rather than any actual difference in Pi4/CM4.
I still think this is the synchroniser between generic timer clock and arm clock. I suspect there is documented a requirement that the ratio exceeds some value (at least 6 - possibly 8), although searching around I couldn't find it.
I'll check, but I don't think we have control over the generic timer clock (I think it can only come from oscillator). If there is no way of reducing that clock, then the only way to avoid this issue is to enforce a higher minimum arm clock.
The armstub test on a Pi5 doesn't observe any clock drift when arm_freq=125MHz.
It looks like on Pi3 the generic timer is clocked at 1MHz, so the issue doesn't occur.
There may be an intermediate clock domain running at neither 54MHz nor arm_freq - does the ARM AXI and/or L2 frequency change with arm_freq?
Are there any updates here, or are there still areas where I can provide support?
I think we have all the input we need. The current feeling is that it's a fact of life caused by a possibly-undocumented minimum ratio between clocks, the likely fix being to raise the minimum core clock, but we'd rather have a definitive answer from Broadcom to that effect.
Okay, thank you very much for the quick response. If I understand correctly, we are currently waiting for feedback from Broadcom, right?
Is there a workaround here to set the minimum CPU frequency, for example, similar to the arm_freq=300? So far, I haven't been able to find anything concrete about such a function.
If I understand correctly, we are currently waiting for feedback from Broadcom, right?
Probably - @popcornmix has been looking after this one.
Is there a workaround here to set the minimum CPU frequency
Try core_freq_min=350
.
I inserted the parameter core_freq_min=350
and temp_limit=30
into /boot/firmware/config.txt
under the [all]
section, but I didn't achieve the desired result:
pi@raspberrypi:~$ vcgencmd measure_temp
temp=39.9'C
pi@raspberrypi:~$ vcgencmd measure_clock arm
frequency(48)=300111328
Can you confirm that your value "stuck"?
$ vcgencmd get_config core_freq_min
Yes, I can confirm that the value has stuck:
pi@raspberrypi:~$ vcgencmd get_config core_freq_min
core_freq_min=350
pi@raspberrypi:~$ vcgencmd measure_temp
temp=35.0'C
pi@raspberrypi:~$ vcgencmd measure_clock arm
frequency(48)=300111328
This needs a firmware update. The throttling code can go below arm_freq_min
.
If a user set, say arm_freq_min=1800
you wouldn't want no throttling to be applied at all.
We need to decide on a safe value for this (e.g. 350 or 400MHz).
Any news on this thread? I noticed that when arm frequency goes at 180 MHz due to heavy throttling , system time is unusable as it moves one second forward every 9 real seconds. This make the system unusable.
Describe the bug
When the clock is reduced due to high CPU temperature, the system time of the CM4 runs significantly slower than another reference time. The difference here is approximately 9 seconds per minute.
This issue occurs when the CM4 is exposed to high temperatures, causing the CPU to throttle down to approximately 300MHz. The problem can also be replicated without high temperatures by lowering the temp_limit in the config.txt file to, for example, 30°C, and without any cooling present.
Steps to reproduce the behaviour
A Raspberry Pi CM4 without WiFi/BT, featuring 4GB RAM and 16GB memory on the official I/O board, is being used. The current Raspberry Pi x64 Lite is flashed onto the CM4 and put into operation.
In the config.txt file, the entry temp_limit=30 is added, the current time is set, and the system is restarted. After a while, the system time deviates significantly from the actual time, which does not occur without the limit setting.
Device (s)
Raspberry Pi CM4
System
OS and Version: Raspberry Pi reference 2023-12-11 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 2acf7afcba7d11500313a7b93bb55a2aae20b2d6, stage2
Firmware: Oct 17 2023 15:39:16 Copyright (c) 2012 Broadcom version 30f0c5e4d076da3ab4f341d88e7d505760b93ad7 (clean) (release) (start)
Kernel: Linux raspberrypi 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux
Logs
The following log output was accessed on the CM4 via the serial interface from a second host system. The following code was executed on the host system:
Additional context
No response