openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
733 stars 334 forks source link

Wrong temp reading on MT7915_phy0 #729

Open Sandokan71 opened 1 year ago

Sandokan71 commented 1 year ago

I did some tests for temp reading. I get the following readings from internal sensors in standby (no devices connected on WiFi) with a room temp of 20C:

phy0 2.4Ghz -> 68C phy1 5Ghz -> 43C On the SoC the reading by internal sensor is 45C.

The temperatures detected with a thermal scanner (my bet was 3-4C low) are: on 2.4Ghz -> 37.1C on 5Ghz -> 40.5C On the Soc I get 45.5C.

It seems to me that the temp reading by the sensor on 2.4Ghz chip is not so correct.

dangowrt commented 1 year ago

As this issue was reported first in BananaPi forum to occur on BPi-R3, let me add some details: This is MT7986A with MT7975PN and MT7975N front-ends. The wrong temperature readings correspond to the MT7975N chip in charge of 2.4 GHz.

Sandokan71 commented 1 year ago

Adding an information about test I made. Graphics of first 2H from a cold start in attach (measurament are made with a unique heatsink on both chip) shows that on 2.4Ghz front-end MT7975N (Phy0) there is an offset of about 27 Celsius above what expected. This assuming that both 2.4 and 5Ghz chips have a similar behaviour.

2023-01-17

frank-w commented 1 year ago

can confirm the difference

root@bpi-r3:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
66000
45000

measured the chips with infrared thermometer

2g4: 47°C 5G: 44°C

Sandokan71 commented 1 year ago

yes, and observing the graph it is impossible that few seconds from the device start the 2.4Ghz chip is at 55C and the 5Ghz chip at 27C.

ryderlee1110 commented 1 year ago

Here is the output of my MT7986 reference board. Looks normal.

root@OpenWrt:/# cat /sys/class/ieee80211/phy/hwmon/temp1_input 44000 48000

Sandokan71 commented 1 year ago

After many tests and measurements I confirm bad temp reading on sensors of the 2.4ghz chip on my board. Maybe a problem on the chip but only on temp reading? The chip real temp seems normal and it works regular.

ryderlee1110 commented 1 year ago

BPI R3?

frank-w commented 1 year ago

@ryderlee1110 does your ref-board use MT7975N too for 2g4?

ryderlee1110 commented 1 year ago

MT7976 for 2/6g

frank-w commented 1 year ago

So we maybe need different offset or calculation for this chip

https://github.com/openwrt/mt76/blob/master/mt7915/init.c#L55

https://github.com/openwrt/mt76/blob/master/mt7915/mcu.c#L3108

When looking at the graph above,offset/command is right,but value itself seems not millicelsius or need some other calibration data?

Sandokan71 commented 1 year ago

Here a graph comparing MT7975N and MT7975N on three days with and without fan cooling to explore more temps range. The calculations seems to me correct. Probably it is only an offset issue.

2023-02-03

frank-w commented 1 year ago

i guess more the eeprom (which maybe sets the temp value offset) is wrong...

i see function mt7915_eeprom_name in mt7915/eeprom.c which selects the eeprom, but this function seems not to be called on my r3 as i do not see my printks i added there...

i try to further debug, but this function seems to be called only if there is no eeprom...stop wait...we have added eeprom in dts...both in my repo and openwrt...maybe this is the wrong for out frontend-chips

frank-w commented 1 year ago

same output with disabled eeprom-data in dts

root@bpi-r3:~# cat /sys/class/ieee80211/phy/hwmon/temp1_input 43000 23000

my debug shows now that MT7975_DUAL_ADIE (MT7986_EEPROM_MT7975_DUAL_DEFAULT) option is used after first eeprom-load (mt7915_eeprom_load) fails now in mt7915_eeprom_init with ret=-22, second one (mt7915_eeprom_load_default) returns 0

https://elixir.bootlin.com/linux/v6.2-rc6/source/drivers/net/wireless/mediatek/mt76/mt7915/eeprom.c#L60

dangowrt commented 1 year ago

Maybe this is a bug in the EEPROM data supplied by SinoVoip and we should actually just fix that...

frank-w commented 1 year ago

I loaded the eeprom which is available in linux-firmware git

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek

But yes,it can be wrong

frank-w commented 1 year ago

@ryderlee1110 any idea how to get further here?

Sandokan71 commented 1 year ago

The issue still there, any idea on how to solve?

codingtony commented 1 year ago

I'm confirming the issue on a Banana r3 with OpenWRT r22537-32f134fbdf. I used a thermometer gun and I get a reading of maximum 40C and the sensor reports 63C.

Sandokan71 commented 1 year ago

I purchased a second BPI-R3, and on this one the detected temperature is correct. Something differs between the two boards.

frank-w commented 1 year ago

What is hardware revision and can you look on the frontend chip if this is still a mt7975?

Sandokan71 commented 1 year ago

Both have the same revision v1.1 and the same IC.

dangowrt commented 1 year ago

board assembly process...

I purchased a second BPI-R3, and on this one the detected temperature is correct. Something differs between the two boards.

It could be that efuse inside the MT7975 ICs doesn't come with valid thermal calibration which should have been done by the board vendor...

Sandokan71 commented 1 year ago

I agree with you. It would be useful to know if it is possible to set properly the efuse.
After long time monitoring I can confirm that on my original board the 2.4Ghz have +27C offset. It is not good to see temperatures of 60-75C with 20C ambient temp but since they are actually 33/48C I am not so worried about this. However, I hope this will not result in strange behavior if temperatures rise further when ambient temp will rise to 30C and over. Like thermal protection engage or similar.

frank-w commented 1 year ago

Or at least detect the problematic firmware (or invalid calibration data) from driver to handle it there (maybe off-tree for affected boards to hold mainlinedriver clean for this)?

skramstad commented 4 months ago

Sorry to bump this issue again, but I have the opposite of what's posted earlier. Rev 1.1

root@bpi:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
49000 <-- 2g
66000 <-- 5g