Open lategoodbye opened 5 years ago
Just another data-point: I built https://github.com/raspberrypi/linux/tree/rpi-5.2.y and I'm getting bcm2835-power: Timeout waiting for grafx
power OK on my rpi2-b.
model:
RPI 3B
firmware version:
2019-07-15 17:34
kernel version:
5.2.2-1-ARCH #1 SMP Sun Jul 21 19:53:44 UTC 2019 aarch64 GNU/Linux
kernel logs:
[ 6.514813] bcm2835-power bcm2835-power: Timeout waiting for grafx power OK
[ 6.524622] bcm2835-power bcm2835-power: Timeout waiting for grafx power OK
The VC4
driver was loaded but no GPU hardware was detected.
I'm getting the same issue with RPi 3B+, Arch Linux aarch64, Kernel 5.2.10-1-ARCH.
No GPU hardware detected and dmesg shows
bcm2835-power: Timeout waiting for grafx power OK
.
However, I have several Pi 3B+ and it is NOT happening on all of them (using the same SD card with the same image). Some of them detect the VC4 GPU during boot just fine.
And with the other boards, it appears to be temperature related. When the board is at room temperature (having been unpowered for some time) the GPU is detected normally. Also, over a couple of reboots. But after some minutes, when the temperature rises above about 50 °C, the GPU is not detected any longer on reboot and the bcm2835-power log message appears.
Maybe that additional piece of information helps tracking down the issue.
Thanks for your report. I build the Mainline kernel 5.3-rc6 with multi_v7_defconfig (Raspbian rootfs) for my RPI 3B+. Then i caused enough load to reach ~ 54 °C (no cpufreq enabled) and triggered a reboot. "Unfortunately" i wasn't able to reproduce the timeout.
Thanks for looking into it. I have seven Pi3B+ boards and I am currently testing them all under the same conditions to see how many of them are affected (so far 2 out of 4 fail when warm, fully reproducible; the others never fail). Maybe some chips are more 'sensitive' to the power-up ramp than others. Could changing the current ramp (lower initial, lower step size, more time between steps) help? I'd try playing around with bcm2835-power.c but I have no experience integrating a custom kernel for the RPi and don't know if it is as simple as 'replace the ARCH kernel with the selfmade one'. Let me know if I can do any useful tests with the affected boards.
One update: I started building the (mainline) kernel using your defconfig (arm64/configs/defconfig). I interrupted when I realized that it is going to take some time... I'll do it at home over night ;-). But: I started compilation on one of the boards which were not affected. Then, during compile, temperature rised to 65°C, and I rebooted -> no VC4 and the bcm2835-power timeout occured. After cooling down back to 50°C the GPU was again recognized normally during several reboots.
So it is definitely a matter of temperature, but the cut between good and bad varies from device to device. Maybe you can stress your board to higher temperatures and see if the timeout appears as well.
FYI I'm seeing the timeouts on my RPI3b+ with 5.3.0-rc4. Can't really say whether it's temperature related as it always fails. I can run some debugging if needed.
After enabling the Mainline cpufreq driver i'm seeing the timeouts, too.
IIRC The main functional difference between the downstream cpufreq driver and upstream is that we're disabling turbo mode when changing the clocks.
What about no cpufreq and setting arm's clock @ 1.2GHz in config.txt?
I don't think there is a issue with cpufreq driver. Since my default governour is ondemand, this causes much more CPU stress during boot.
I will try to test your suggestion.
My test results: arm_freq=1200, no cpufreq => no timeout force_turbo=1, no cpufreq => timeout
@popcornmix Any idea to analyze this further? Without documentation i don't have a clue what's going on in the new bcm2835 pm driver.
I made a register dump of the PM addresses for the following cases: 1) Linux 5.3 without e1dc2b2e1bef7237fd8fc055fe1ec2a6ff001f91 (this should be similiar to pre Linux 5.1) 2) Linux 5.3 with e1dc2b2e1bef7237fd8fc055fe1ec2a6ff001f91 (this should be similiar to Linux 5.1 or newer), without timeout occured
Comparing both dumps showed only 1 difference: 1) PM_RSTS (Addr 0x3F100020) = 0x00001000 2) PM_RSTS (Addr 0x3F100020) = 0x00000000
Note: without e1dc2b2e1bef7237fd8fc055fe1ec2a6ff001f91 and with enabled forced_turbo i'm not able to reproduce the timeout
@anholt Is this expected?
@lategoodbye the difference in PM_RSTS registers is just:
12 | | HADPOR | Had a power-on reset
so I guess first was captured after a power cycle, and second after a sudo reboot
Okay, thanks. So the difference is unrelated.
I will wait for suggestions to narrow down this issue until the release of Linux 5.4-rc1, after that i will revert e1dc2b2e1bef7237fd8fc055fe1ec2a6ff001f91 according to the no regression policy.
For what it is worth I am seeing this error pop up multiple times with 5.3.0 on a 3b+ running arm64/ubuntu using a mainline kernel from here: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3/
with config.txt using this dtb: device_tree=bcm2837-rpi-3-b-plus.dtb
dmesg:
https://paste.ubuntu.com/p/sKT7KyJdSc/
I'm noticing that a warm reboot using sudo reboot
fails (or is very very very delayed), but power cycling allows the device to come up just fine.
(My setup is currently headless, so I'm not seeing what comes up on the screen when this situation arises.)
It seems this might be connected? (Or I can open another issue if it seems unconnected.)
The error message is the same, and the fact that upstream code shows the same issue is useful datapoint.
By the way, you should be able to replace device_tree=bcm2837-rpi-3-b-plus.dtb
with the more general upstream_kernel=1
.
Yesterday, i tested the revert against current Mainline Linux 5.4 + Raspbian Buster with a Raspberry Pi 3 B+ . Unfortunately X hangs completely during boot, so i asked Florian to drop this patch :-(
Add these lines to the dts file, compile it, replace the dtb with the newly compiled one, then the gpu will start working.
&v3d {
power-domains = <&power RPI_POWER_DOMAIN_V3D>;
};
Add these lines to the dts file, compile it, replace the dtb with the newly compiled one, then the gpu will start working.
&v3d { power-domains = <&power RPI_POWER_DOMAIN_V3D>; };
This was the reason behind the revert. But the revert causes hang during boot of Raspbian, so i decided to drop the revert.
It seems that without these reverts, the GPU will also work, so maybe these reverts cause the X hang?
diff --git a/arch/arm/boot/dts/bcm283x.dtsi b/arch/arm/boot/dts/bcm283x.dtsi
index 2d191fc..b238567 100644
--- a/arch/arm/boot/dts/bcm283x.dtsi
+++ b/arch/arm/boot/dts/bcm283x.dtsi
@@ -3,7 +3,6 @@
#include <dt-bindings/clock/bcm2835-aux.h>
#include <dt-bindings/gpio/gpio.h>
#include <dt-bindings/interrupt-controller/irq.h>
-#include <dt-bindings/soc/bcm2835-pm.h>
/* firmware-provided startup stubs live here, where the secondary CPUs are
* spinning.
@@ -121,7 +120,7 @@
#interrupt-cells = <2>;
};
- pm: watchdog@7e100000 {
+ watchdog@7e100000 {
compatible = "brcm,bcm2835-pm", "brcm,bcm2835-pm-wdt";
#power-domain-cells = <1>;
#reset-cells = <1>;
@@ -641,7 +640,6 @@
compatible = "brcm,bcm2835-v3d";
reg = <0x7ec00000 0x1000>;
interrupts = <1 10>;
- power-domains = <&pm BCM2835_POWER_DOMAIN_GRAFX_V3D>;
};
vc4: gpu {
It seems that without these reverts, the GPU will also work, so maybe these reverts cause the X hang?
Devicetree changes usually don't cause hangs, it's more a driver issue. According your change you combine the "best" of both power drivers. Unfortunately it's unsafe to handle the same register ranges with two Linux drivers. Currently i only see two options:
Add these lines to the dts file, compile it, replace the dtb with the newly compiled one, then the gpu will start working.
&v3d { power-domains = <&power RPI_POWER_DOMAIN_V3D>; };
This solves it for me. I just replaced the old /boot/dtbs/broadcom/bcm2837-rpi-b.dtb with yours and it worked. Thanks @redchenjs (cfg: raspberry pi 3b + manjaro)
A RPi3B+ of mine has not been used for a while. Yesterday I started with a new project and I setup the RPi.
I used a new SD card and prepared it with Arch Linux ARM AArch64. I run into the problem reported here.
I turned the RPi off yesterday evening and turned it on this morning. Same problem. As the RPi has been turned off for hours I don't think mine has been too hot on its first power up this morning.
So, if you need another board to get some diagnostic information, I can try to provide.
The consensus above is that this is caused by an incompatibility in the upstream/mainline 3B+ DTB. Edit the source file as described by @sankayop above and rebuild it (or download the prebuilt version they link to) and try with that.
Thank you for your reply, will give it a try later...
Will the specific change that seems to be applied to all the fedora kernel versions make it upstream? I did not find it here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/bcm2835-rpi.dtsi
Currently for upstream i only see two "options": 1) revert Eric's complete bcm2835-power series 2) merge the working parts from raspberrypi-power into bcm2835-power
I'm not happy with both of them. @sankayop patch will enable both power driver for the same power domain. I consider this as a path to hell ...
@lategoodbye Do you have a preference between 1 and 2? Is there something we can do to help?
Number 1 isn't a real option, because we need this driver for Raspberry Pi 4. Number 2 should be do able for downstream, but would result more likely in a merge of both drivers for upstream.
The best option would be to ask someone with deeper understanding of BCM2835 why the rampup causes these random timeouts (timing issue, missing requirements, wrong order of power domain handling) and fix the bcm2835 power driver.
Any updates on this?
In the upstream kernel the suggested patch to revert has been applied. The hanging X issue was unrelated. https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20200402&id=e7b7daeb48e0bf5d8412d77f11069750ee7032bb
When booting up my Raspberry Pi 2 Model B with Arch Linux ARM, it seems one of two things happens:
config.txt
. When I run startx
to start LXDE, the DE launches, then almost immediately hangs.
startx
, LXDE functions as expected.
In my testing, it does seem that the first occurrence is more likely when the Pi is cooled down, rather than right after rebooting. Most of these are things that have already been pointed out, but I wanted to provide a test case for anyone else having the issue.
Is the issue with X hanging being tracked anywhere?
Is the issue with X hanging being tracked anywhere?
Here is the accepted fix: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20200504&id=b1e7396a1d0e6af6806337fdaaa44098d6b3343c
Seems the Pi 3 A+ has the same problem :-/
Describe the bug Starting with Linux 5.1 there is a new power driver for BCM2835. The idea behind this is to have a better control about the V3D power domain. After rollout i got informed that some RPI boards (currently a handfull) have issues during enabling the V3D power domain. The ramp-up runs into a timeout (20 us), because we never get a PM_POWOK. I don't have a clue what causes this issue (timing, hardware tolerance, ...). Currently i don't have a board, which is affected.
To reproduce start the RPI with Mainline Kernel 5.1
Expected behaviour bcm2835-power succeeded to enable V3D power domain
Actual behaviour bcm2835-power failes to enable V3D power domain because PM_POWOK stays off
System
vcgencmd version
)? 2019-02-12, 2019-03-27uname -a
)? Mainline Kernel / DTB 5.1Logs
More info: https://github.com/anholt/linux/issues/153
Additional context Add any other relevant context for the problem.