raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.15k stars 1.68k forks source link

Rpi PoE hat goes brrrrr on Rpi 4b #1531

Closed majkrzak closed 3 years ago

majkrzak commented 3 years ago

Reposting https://github.com/home-assistant/operating-system/issues/1208 as it might be base system issue

Fan of Raspberry Pi PoE HAT connected to Raspberry Pi 4 B runs constantly at full speed, no matter what options are set in config.txt. At last try it was something like: dtoverlay=rpi-poe,poe_fan_temp0=65000,poe_fan_temp0_hyst=1000,poe_fan_temp1=70000,poe_fan_temp1_hyst=2500,poe_fan_temp2=80000,poe_fan_temp2_hyst=5000,poe_fan_temp3=90000,poe_fan_temp3_hyst=5000

dmesg throws following error:

[...] rpi-poe-fan rpi-poe-fan@0: Failed to get default PWM value: -5
[...] rpi-poe-fan: probe of rpi-poe-fan@0 failed with error -5
majkrzak commented 3 years ago

It is failing on: https://github.com/raspberrypi/linux/blob/rpi-5.4.y/drivers/hwmon/rpi-poe-fan.c#L329

majkrzak commented 3 years ago

It can be replicated with latest raspbian: https://downloads.raspberrypi.org/raspios_full_armhf/images/raspios_full_armhf-2021-01-12/2021-01-11-raspios-buster-armhf-full.zip

majkrzak commented 3 years ago

Updating /boot/ with this from /dev/ branch does not help

shantanugoel commented 3 years ago

I ran into the same issue. I use archlinux which packages the raspberrypi-bootloader and raspberrypi-bootloader-x packages. Downgrading back to the 20210111 version fixed this for me.

ndanyluk commented 3 years ago

Echoing that I am seeing this on 3x Pi 3B+ and 1x Pi 4B all simultaneously after upgrading kernel to 5.10.y using apt update && apt upgrade -y. However, in my case I get no fan control at all and the Pis are thermally throttling.

No changes made to config.txt from stock, stock Raspbian lite, same errors in dmesg

shantanugoel commented 3 years ago

My hunch is that this issue is related to commit c78f3ef45229ab722ec6b858f39b078535d88bee which has some changes for poe_hat/i2c

pelwell commented 3 years ago

I've independently arrived at the same conclusion, and reverting that commit make the POE hat fan work again, but with other consequences. The author of that commit is aware.

pelwell commented 3 years ago

A fix is in the works, but in the meantime you can revert the firmware to before the problem was introduced:

$ sudo SKIP_KERNEL=1 rpi-update 32f92809
majkrzak commented 3 years ago

I'm wondering how I encounter this issue with Home Assistant Rpi4 image 5.10 released on 1st of January while problematic commit is from 15th? :thinking:

EdBoraas commented 3 years ago

I'm wondering how I encounter this issue with Home Assistant Rpi4 image 5.10 released on 1st of January while problematic commit is from 15th? 🤔

@majkrzak, it's possible that your original issue was actually something different. One thing that bit me in the past is that the dtoverlay= lines in config.txt have a line length limit, and they just ignore anything past that limit. I had a very similar issue to you, in that poe_fan_temp3 wasn't being set because it was past the line length, so I was nearly-continually in excess of the default amount for temp3, causing my fan to run excessively.

You can verify that your temperature and hysteresis values are set correctly by running this command: od -An --endian=big -td4 /proc/device-tree/thermal-zones/cpu-thermal/trips/trip?/temperature /proc/device-tree/thermal-zones/cpu-thermal/trips/trip?/hysteresis (od is necessary in this case to decode the raw values as presented via the /proc interface)

The solution was to break out the dtparam= values onto separate lines, like so:

dtoverlay=rpi-poe   
dtparam=poe_fan_temp0=65000   
dtparam=poe_fan_temp0_hyst=1000   
dtparam=poe_fan_temp1=70000
dtparam=poe_fan_temp1_hyst=2500
dtparam=poe_fan_temp2=80000
dtparam=poe_fan_temp2_hyst=5000
dtparam=poe_fan_temp3=90000
dtparam=poe_fan_temp3_hyst=5000

Hopefully that helps. I suspect the two issues may have gotten conflated here; in my experience, the recently-introduced firmware issue seems to force the fan off. not on.

jordipalet commented 3 years ago

A fix is in the works, but in the meantime you can revert the firmware to before the problem was introduced:

$ sudo SKIP_KERNEL=1 rpi-update 32f92809

Trying to use this, however, it turns into a download of 35 minutes (for about 114M) then I don't think it is completed correctly because I get: gzip: stdin: unexpected end of file tar: unexpected end of file tar: Error is not recoverable: exiting now

Any suggestion ?

pelwell commented 3 years ago

That sounds like one of the following:

pelwell commented 3 years ago

For those of you reporting no fan activity, this has been diagnosed and a fix pushed to the internal firmware repo. Expect a release in a few days.

jordipalet commented 3 years ago

Tks! after 2 a couple of hours and a few retries, it worked ...

Internet was 300 gbps symmetrical, not using an SD, but an external USB3.1 SSD with free space ...

So I guess it was something on the other side (github servers) or just bad luck.

Normally any firmware update takes a couple of seconds, so don't understand why this "downgrade" took more than half an hour ....

jordipalet commented 3 years ago

For those of you reporting no fan activity, this has been diagnosed and a fix pushed to the internal firmware repo. Expect a release in a few days.

I will try next saturday ... should be done by then! avoid risks during the week ...

popcornmix commented 3 years ago

rpi-update contains the potential fix. Please test and report.

majkrzak commented 3 years ago

As I had feelings that problem with my device had different reason (maybe hardware) I decided to get replacement, so I'm not anymore able to check the fix.

FrankMaute commented 3 years ago

rpi-update contains the potential fix. Please test and report.

That worked, thanks.

Mark-Spitz commented 3 years ago

rpi-update contains the potential fix. Please test and report.

My two RPI are working fine again (5.10.14). From temp=70 back to 45.. Pffffff Thanks a lot!

cubecell commented 3 years ago

The fix works for my RPI 3B+, but the PoE-Fan on my new RPI 4B still runs at full speed all the time. On the RPI 3 I found a cooling_device0 and in /sys/class/thermal, on the RPI 4 there is only the thermal_zone0

EdBoraas commented 3 years ago

rpi-update contains the potential fix. Please test and report.

@popcornmix Working well with the fix via rpi-update. Thanks.

jordipalet commented 3 years ago

rpi-update contains the potential fix. Please test and report.

@popcornmix Working well with the fix via rpi-update. Thanks.

I'm confused that some people said it is working, others not. I think it will be good to be more specific, such as: a) It is working in Rpi4b? b) It means the fan speed is still controllable as it was previously or will keep running all the time?

pelwell commented 3 years ago

I specifically tested it on Pi 4 - can you post your non-standard config.txt settings?

cubecell commented 3 years ago

I only add

dtoverlay=rpi-poe  
dtparam=poe_fan_temp0=45000
dtparam=poe_fan_temp1=50000  
dtparam=poe_fan_temp2=55000  
dtparam=poe_fan_temp3=60000

dmesg shows the same error:

[...] rpi-poe-fan rpi-poe-fan@0: Failed to get default PWM value: -5
[...] rpi-poe-fan: probe of rpi-poe-fan@0 failed with error -5
pelwell commented 3 years ago

That suggests you are still running the old firmware. What does vcgencmd version report?

ricktendo commented 3 years ago

Works now

cubecell commented 3 years ago

vcgencmd version report:

Feb  8 2021 14:32:22 
Copyright (c) 2012 Broadcom
version 73b3cad64181954e67f6e9fe6d275378d3eefa10 (clean) (release) (start)

and rpi-update says

*** Your firmware is already up to date
pelwell commented 3 years ago

That configuration works for me - I don't know what else you might have changed.

Run this and post the generated URL:

$ sudo apt install pastebinit
$ dmesg | pastebinit
cubecell commented 3 years ago

Here is my dmesg report

http://paste.debian.net/1184744/

I just edited the config.txt. Yesterday I tested it with a new sd card (2021-01-11-raspios-buster-armhf-lite). Only apt update && upgrade ,rpi-update and the above lines to config.txt. With the same result, the fan runs at full speed all the time. Even after a sudo shutdown -h now.

pelwell commented 3 years ago

I get the same kernel behaviour as you with no fan fitted, which makes me wonder if the fan controller is being detected by the firmware. What do the following commands return?

$ sudo dtparam i2c_vc=on
$ sudo i2cdetect -y 0
cubecell commented 3 years ago

First try:

$ sudo dtparam i2c_vc=on
$ sudo i2cdetect -y 0

output: Error: Could not open file /dev/i2c-0 or /dev/i2c/0: No such file or directory After activating the i2c bus with raspi-config, the following output is displayed:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --                        
FrankMaute commented 3 years ago

Update: After the firmware update these params are no longer working/respected: dtparam=poe_fan_temp0=70000,poe_fan_temp0_hyst=5000 dtparam=poe_fan_temp1=80000,poe_fan_temp1_hyst=5000 Yes a bit high, but I wanted to test if the hat-fan does shut off or not.

pelwell commented 3 years ago

You need to provide four values now - 0 to 3 - otherwise there's a danger (as you've found) that they will all be ignored because they don't monotonically increase.

pelwell commented 3 years ago
pi@raspberrypi:~$ dtoverlay -h rpi-poe
Name:   rpi-poe

Info:   Raspberry Pi PoE HAT fan

Usage:  dtoverlay=rpi-poe,<param>[=<val>]

Params: poe_fan_temp0           Temperature (in millicelcius) at which the fan
                                turns on (default 40000)
        poe_fan_temp0_hyst      Temperature delta (in millicelcius) at which
                                the fan turns off (default 2000)
        poe_fan_temp1           Temperature (in millicelcius) at which the fan
                                speeds up (default 45000)
        poe_fan_temp1_hyst      Temperature delta (in millicelcius) at which
                                the fan slows down (default 2000)
        poe_fan_temp2           Temperature (in millicelcius) at which the fan
                                speeds up (default 50000)
        poe_fan_temp2_hyst      Temperature delta (in millicelcius) at which
                                the fan slows down (default 2000)
        poe_fan_temp3           Temperature (in millicelcius) at which the fan
                                speeds up (default 55000)
        poe_fan_temp3_hyst      Temperature delta (in millicelcius) at which
                                the fan slows down (default 5000)
FrankMaute commented 3 years ago

@pelwell That worked like a charm. Thanks!

tdolder commented 3 years ago

My Octoprint raspi 3B+ fan spins at full speed.

Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom *** Your firmware is already up to date

output dmesg: [ 6.906543] rpi-poe-fan rpi-poe-fan@0: Failed to get default PWM value: -5 [ 6.906599] rpi-poe-fan: probe of rpi-poe-fan@0 failed with error -5

config.txt dtoverlay=rpi-poe dtparam=poe_fan_temp0=45000,poe_fan_temp0_hyst=1000 dtparam=poe_fan_temp1=55000,poe_fan_temp1_hyst=5000 dtparam=poe_fan_temp2=60000,poe_fan_temp2_hyst=5000 dtparam=poe_fan_temp3=65000,poe_fan_temp3_hyst=5000

Test hysteresis with: od -An --endian=big -td4 /proc/device-tree/thermal-zones/cpu-thermal/trips/trip?/temperature /proc/device-tree/thermal-zones/cpu-thermal/trips/trip?/hysteresis

output test: 45000 55000 60000 65000 1000 5000 5000 5000

pelwell commented 3 years ago

Is this new behaviour? If so, what were you running before the update?

jordipalet commented 3 years ago

I just tried rpi-update (in a Rpi 4B 8 GB), it updated the firmware and rebooted, in fact a new rpi-update shows: *** Your firmware is already up to date

I also did a apt update, upgrade, etc. as I do once a week to make sure that all the system is up to date, then a reboot.

od -An --endian=big -td4 /proc/device-tree/thermal-zones/cpu-thermal/trips/trip?/temperature /proc/device-tree/thermal-zones/cpu-thermal/trips/trip?/hysteresis 50000 55000 60000 65000 5000 5000 5000 5000

which seems that is correctly reading my config.txt dtoverlay=rpi-poe dtparam=poe_fan_temp0=50000,poe_fan_temp0_hyst=5000 dtparam=poe_fan_temp1=55000,poe_fan_temp1_hyst=5000 dtparam=poe_fan_temp2=60000,poe_fan_temp2_hyst=5000 dtparam=poe_fan_temp3=65000,poe_fan_temp3_hyst=5000

however vcgencmd version shows: Jan 27 2021 22:19:57 Copyright (c) 2012 Broadcom version 99d9a48302e4553cff3688692bb7e9ac760a03fa (clean) (release) (start)

Is not this an older version?

And dmesg shows: [ 30.206439] rpi-poe-fan rpi-poe-fan@0: Failed to get default PWM value: -5 [ 30.206482] rpi-poe-fan: probe of rpi-poe-fan@0 failed with error -5

The temperature is about 79C ...

Any hints, or should I downgrade again?

pelwell commented 3 years ago

apt upgrades and rpi-update are unaware of each other. rpi-update says you are up-to-date because it remembers what it last installed, but the subsequent apt upgrade has overwritten the firmware with an older version.

To get the latest firmware release (not something you should be in the habit of doing unless advised to) delete /boot/.firmware_revisionand run rpi-update again.

tdolder commented 3 years ago

Is this new behaviour? If so, what were you running before the update?

No not new, just did the rpi-update. Before the update the output dmesg shows the same PWM en probe errors. I will try to delete /boot/.firmware_revision and run rpi-update again but first have to wait for the 3dprint to be finished.

jordipalet commented 3 years ago

I know. I didn't done both of them at the same time ... just in different reboots, just to try different things. After deleting the .firmware_revision, it seems is working correctly (even after a new apt upgrade ... Anyway, all fine now, at least at the time being. Tks!

tdolder commented 3 years ago

rpi-update

Removing .firmware_revision and performing the firmware update again did not solve the issue.

[ 7.660334] rpi-poe-fan rpi-poe-fan@0: Failed to get default PWM value: -5 [ 7.660387] rpi-poe-fan: probe of rpi-poe-fan@0 failed with error -5

pelwell commented 3 years ago

When did it last work? You still haven't identified the point at which it went wrong.

jordipalet commented 3 years ago

yes, deleting .firmware_revision and running again rpi-update, worked! Only different thing vs previous attempts was deleting .firmware_revision

pelwell commented 3 years ago

That's great - I was specifically addressing @tdolder.

tdolder commented 3 years ago

That's great - I was specifically addressing @tdolder.

It never worked. It is a new Raspi 3B + with official Raspi POE hat running Octoprint. First tried the Raspi-config option for the fan control. This didn't work. Then tried via dtoverlay in the config.txt. I am now going to try a clean install.

pelwell commented 3 years ago

If you have a spare card, install a fresh Raspberry Pi OS, run sudo rpi-update and add just dtoverlay=rpi-poe to config.txt. If that doesn't work then you have a hardware problem.

tdolder commented 3 years ago

If you have a spare card, install a fresh Raspberry Pi OS, run sudo rpi-update and add just dtoverlay=rpi-poe to config.txt. If that doesn't work then you have a hardware problem.

Test done > spare card with PiOS lite: After sudo rpi-update and reboot, no "rpi-poe-fan" errors in dmesg After adding the 'dtoverlay=rpi-poe' to the config.txt the errors appear,

[ 6.879401] rpi-poe-fan rpi-poe-fan@0: Failed to get default PWM value: -5 [ 6.879446] rpi-poe-fan: probe of rpi-poe-fan@0 failed with error -5

Hardware issue with PoE HAT or Raspi?

cubecell commented 3 years ago

In my case it was a hardware failure of the PoE HAT. New HAT --> no dmesg errors anymore ! @pelwell many thanks for everything !

pelwell commented 3 years ago

I wouldn't necessarily get to hear about failures in the field, but in my opinion it is more likely to be the HAT just because there is more to go wrong. The header pins are wired directly to the pins/balls on the SoC, so unless there's a broken/shorted track or the Pi has been subjected to high voltages (5V is fatal to most pins), there isn't much that can fail.