Closed simonsystem closed 1 month ago
Hi @simonsystem, you're writing "system freeze", so by that you mean that the entire system freezes? Or is it just thinkfan that freezes (i.e. stops doing anything)?
If the entire system locks up then there's probably not much thinkfan can do about it because that would be an issue with your kernel and/or drivers. You might try disabling individual sensors to find out which sensor (or fan) is triggering the freeze.
If it's just thinkfan that freezes, you could get more information with strace:
sudo strace -p `pgrep thinkfan`
And post the output here.
@vmatare , I appear to have the same problem as @simonsystem . In my case, the whole system freezes. Are there any useful diagnostics to pull, in this case?
Hi @simonsystem, you're writing "system freeze", so by that you mean that the entire system freezes? Or is it just thinkfan that freezes (i.e. stops doing anything)?
If the entire system locks up then there's probably not much thinkfan can do about it because that would be an issue with your kernel and/or drivers. You might try disabling individual sensors to find out which sensor (or fan) is triggering the freeze.
If it's just thinkfan that freezes, you could get more information with strace:
sudo strace -p `pgrep thinkfan`
And post the output here.
No, it's the whole system that freezes. without any logging to dmesg or similar. I think its an thinkpad_acpi
related issue. I will create an issue there and link that to this issue.
@top-on May you post your system specs here as well? Is it also a Thinkpad P14s Gen3 Machine?
This is my system, which also freezes after a random time when running thinkfan
:
Maybe noteworthy: I am observing the same freezing behavior when running fancontrol.service
or CoolerControl
.
@simonsystem , thank you for creating and linking that issue!
Added a link to a freshly created Kernel.org Bugzilla issue at: https://bugzilla.kernel.org/show_bug.cgi?id=217548
@top-on Thanks for your system specs. Hope, we can help fixing that issue.
That sounds very inconvenient. Have any of you tried to find out how badly the system is frozen? Because sometimes (though mostly on Display-related problems) it's only the graphical UI (X, Wayland etc.) that freezes, but the Linux text consoles continue to work. So sometimes you can still use Strg-Alt-F1 through Strg-Alt-F6 to pull up one of the text consoles, log in there and check the kernel log with dmesg
.
Another important test is whether the NumLock LED will still switch on & off. If it doesn't, that means your entire kernel is frozen and there's truly nothing left to do except hard reset.
@vmatare , i can confirm that the system fully freezes in these cases: changing the interface with Strg-Alt-F6
is not possible when frozen. because i do not have an numblock on my keyboard, i currently cannot check the LED.
i have tested thinkfan
also with the new BIOS version for the laptop model: 0.1.28
. the other system parameters remained as above. unfortunately, the system also freezes with this new BIOS version.
just for a cross-reference that might be useful, i currently see greater system stability with the coolero
flatpak and the latest BIOS, which however also froze at some point with the previous BIOS version.
i will run coolero
now for a few weeks with the latest BIOS, to see if that is more stable than before.
I have to report that the coolero
also (fully) freezes my system with the above-mentioned parameters.
It freezes somewhat later than with thinkfan
, though :thinking:
I will re-run the tests whenever a new kernel will be shipped to pop_OS!
, or a new BIOS gets released.
Thinkfan was causing freezes so I was searching for another solution for dumb stock fan control (pulsing, delayed reaction to temperature rise).
I would like to report that using pwmconfig
from lm-sensors
also causing freezes.
After freeze - changing keyboard backlight is working (don't know if it's helpful).
- Laptop: Thinkpad T14 Gen3 AMD (21CF)
@PiotrTD5 This ticket only concerns P14s Gen3 AMD models. Even though, your BIOS has the same version number, I cannot confirm that we are talking about the same issue. I want to avoid this ticket to be a general thinkpad-freeze issue. Please open another ticket for your laptop model and reference this ticket to it.
Edit: @PiotrTD5 You are right. My fault, I also think now, that yours is the same.
I just wanted to help. The only difference between P14s Gen3 AMD and T14 Gen3 AMD is model name on LCD bezel and stickers.
They share same BIOS/EC firmware. From official Lenovo BIOS update readme: Support models:
Also, if you study pcsupport.lenovo.com
, parts category, you'll find out that 21J5 and 21CF share the same FRU numbers for motherboards. I don't know about T16 vs P16s and I don't have time to check.
So IMHO, you should add T14 Gen3 AMD model to this issue instead creating another. Don't know why you strictly want it to be P14s Gen3 issue when technically it's the same hardware and firmware. I have zero experience in using github so I'll do what you ask if I am really wrong about this.
the same happens on my ThinkPad P16s Gen 1: total system freeze some time after thinkfan starts
I've got a T14 G3 AMD with the same issue of kernel freezing after awhile of usage.
However with experimental=1 and fan_control=1 modprobe params i can stull echo levels, timeout, enable, disable, disengage into /proc/acpi/ibm/fan without the kernel freezing on me.
I wrote my own shitty Python script as a thinkfan "replacement" and noticed that this happens when we write levels frequently to the fan control file. I built the script so that it checks the current level and compares with what I'd like to set and it seems to be rather "stable" for me now.
https://gist.github.com/Lillecarl/15b683c3cd3bafe74ca3c4dafd427d2e This is the script i used for my testing, keeps my laptop silent for the most part but will ramp the fan all the way up to full-speed (not sure if that's dangerous for the fan or not) if temperatures are high
EDIT: Further testing indicates I was just lucky in the beginning. After realizing i have to write to the fan control file every 110 seconds (after setting watchdog to 120) I started experiencing random lockups again. (Only writes reset the watchdog timeout, which I think is a good idea to keep active if fan control software crashes).
EDIT: Further testing indicates I was just lucky in the beginning. After realizing i have to write to the fan control file every 110 seconds (after setting watchdog to 120) I started experiencing random lockups again. (Only writes reset the watchdog timeout, which I think is a good idea to keep active if fan control software crashes).
@Lillecarl , i really liked your idea of boiling down fan control to "read temperature" and "reduce fan speed for X seconds". i tested a simplified version of your script, but it also completely freezes my machine after some time. it was worth a shot, though :slightly_smiling_face:
BTW: As a workaround, I switched my notebook to "Cool 'n' Quiet" mode in BIOS and completely disabled thinkfan. I think I lost performance, but its not as loud as before. But its not the solution, of course.
@all: Thanks for all your suggestions and assistance in analyzing this issue. @PiotrTD5: Sorry, that I didnt realize, your issue is really the same thing. @Lillecarl: Special thanks for your scripting tests. Good idea, but poorly... nah.
@simonsystem I've been able to control my fans reliably by always stepping through level 1 before level 0.
That's 3 hours, controlling the fans with software all the time.
Please ignore the steep stepping up and down, my control software isn't as polished as thinkfan, although I've got some nice ideas involving reading CPU Package Power from the MSR and use that to step the fans based on actual heat dissipation needs like https://github.com/hirschmann/nbfc does for Windows
EDIT: false...... further natural testing by stressing the cpu every 30-60 seconds got another hang. On the bright side, after switching randomly between levels 1-7 I've discovered that it's going to 0 that freezes the system, no other levels https://prints.lillecarl.com/20231012-225047_lldegbbcjk.png
I've got exactly the same issue with my P14s Gen3 AMD. For now I completely disabled thinkfan (otherwise, I had a freeze every few minutes, looks like a kernel panic because the the REISUB does not respond).
@Lillecarl Sir, you're a lifesaver! I've been pulling my hair due to random hard freezes as mentioned above and it took me some time to pinpoint this issue onto fan control. Albeit I can confirm that not using level 0
mitigates any freezes on my machine.
Didn't use this utility myself, but found out about it today because someone pointed me specifically to this issue. I'll have to take a closer look, but the issue, as others have noticed here, too, seems to be related to the fan speed levels. As per CMake, they can either be numeric values in the 0-7, or 0-255 ranges (https://github.com/vmatare/thinkfan/blob/master/src/thinkfan.conf.5.cmake#L439). The 0-7 range may not be handled properly when adding the fan speed levels here: https://github.com/vmatare/thinkfan/blob/master/src/config.cpp#L106
The config shown here sets the disengaged level as the last level to be added, which at first glance should map to std::numeric_limits<int>::min();
.
I'm not going to speculate any further as to how that might contribute to this bug without cloning the repo and going through the code itself, but that would be where I'd look, so thought I'd mention it here.
Some shameless self-promotion: I found out about this because I hacked together a small utility to manage fan speeds on my old thinkpad (GTK+3, old school C). It's nowhere near as feature complete as this tool, but maybe some of you here can use it until this bug gets fixed: https://github.com/EVODelavega/fan_control
Guys, this is clearly a kernel bug (or most probably in the thinkpad_acpi
kernel module). You need to check the kernel.org bugtracker and potentially report it there.
/sys/class/power_supply/BAT0/hwmon0/subsystem/hwmon1/pwm1
Setting values there to (255/7)*level doesn't lock up my machine.
https://download.lenovo.com/pccbbs/mobiles/r23uj73wd.html
- (New) Change to permit fan rotation after fan error happen.
https://download.lenovo.com/pccbbs/mobiles/r23uj73wd.htm
- (New) Change to permit fan rotation after fan error happen.
@Lillecarl Did you try it? Does it solve our issue? Sounds promising!
@simonsystem Yep, it's finally working! The EC fancontrol is also quite decent, so I rewrote my fancontrol script to turn fans off if average temp is below 60 for 30 seconds, and turn to auto if average temperature is above 60 for 30 seconds or above 70 for one measurement. https://github.com/Lillecarl/nixos/blob/master/scripts/fancontrol2.py It can be simplified further but it's got legacy from previous attempts at things 😄
I reckon we can close this? If the new UEFI and EC is out for your model too 😄
At least for P14s Gen3 (21J5), this BIOS version isn't available anymore. https://pcsupport.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-p-series-laptops/thinkpad-p14s-gen-3-type-21j5-21j6/downloads/ds557681-bios-update-utility-bootable-cd-for-windows-10-64-bit-thinkpad-t14-gen-3-type-21cf-21cg-t16-gen-1-type-21ch-21cj-p16s-gen-1type-21ck-21cl?category=BIOS%2FUEFI
This BIOS version R23UJ73W is reported Lenovo cloud not working issue, hence it has been withdrawn from support site.
I downloaded it, once it was available. The fan issue was gone, I could set my fan to 0 without freezes.
But I got standby issues. The system now freezes, when coming back from deep standby, after staying at sleep for an hour or so. Poorly, there is no BIOS option for changing the standby mode, so I cannot try other modes. I think it's fixed to "Modern Standby", which is maybe not well supported by Linux. I'm not an expert in these hardware things. (https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate)
@Lillecarl So, nah, BIOS version 1.49 (R23UJ73WD) has been withdrawn. So, it's not closed yet, isn't it? How about your model, is that BIOS version still available?
@simonsystem It's withdrawn for T14 G3 as well. Meme company. I'm using s2idle, on the AMD system it draws just 30% per 2 days or so so it's good enough for me.
@lillecarl:matrix.org if you wanna keep discussing, this is already miles offtopic from thinkfan 😄
It's withdrawn for T14 G3 as well. Meme company.
@Lillecarl Sorry for putting my 2 cents to the offtop, but this is mildly infuriating as it's the second bios version withdrawn in a row to which I've updated. Previous withdrawn one could brick the device, I hope this one won't. Meme company indeed.
From that Lenovo thread it seems like a proper fix might take another while. In the meantime, another possible workaround is using "level auto"
instead of speed 0 for the idle fan speed setting. This does turn off the fan for sufficiently low temperatures, though I have not found the exact boundary yet.
Hi, im having trouble with my Thinkpad P14s Gen3 AMD Machine Type 21J5. Evertime, when I start Thinkfan, its freezing after a random amount of time. No logs, direct freeze, without turning black.
I already tried:
options thinkpad_acpi fan_control=1 experimental=1
in modprobe.conf.amd_pstate=active
as kernel param.amdgpu.dcdebugmask=0x10
as kernel param.This is my thinkfan.conf:
This is my journal for thinkfan systemd service:
As you can see, a few minutes it controls fan level, but then I got this system freeze. Without starting thinkfan or zcfan, it properly works, without freezing, but with that annoying noise of my fan.
My system:
Edit: Link to Kernel.org Bugzilla issue: https://bugzilla.kernel.org/show_bug.cgi?id=217548 Link to Lenovo Forums topic: https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T-series-Laptops/ThinkPad-T14-Gen-3-21CF-kernel-freezes-when-controlling-fans-on-Linux/m-p/5252479