Closed estan closed 2 years ago
I also checked the output of grep -r . /sys/bus/platform/devices/INT3400:00/
after
1) About a minute in my lap 2) About a minute on my desk
and there was no difference in output.
My ACPI dump looks quite different from @mrhpearson's. I don't think there's two DPTF tables in mine (?). I'm pretty sure I run an older Lenovo firmware though, since they did not release the newest one for all models.
For reference, here's the output of fwupdmgr get-devices
, showing Lenovo firmware version:
fwupdmgr-devices.txt
In case it helps, here's output from cat /sys/bus/platform/devices/INT3400:00/data_vault
:
data_vault.txt (.txt suffix because GitHub won't accept it otherwise).
And also:
estan@edison:~$ for f in /sys/bus/platform/devices/INT3400\:00/odvp*; do echo "$f: $(< $f)"; done
/sys/bus/platform/devices/INT3400:00/odvp0: 2
/sys/bus/platform/devices/INT3400:00/odvp1: 0
/sys/bus/platform/devices/INT3400:00/odvp10: 0
/sys/bus/platform/devices/INT3400:00/odvp11: 0
/sys/bus/platform/devices/INT3400:00/odvp12: 0
/sys/bus/platform/devices/INT3400:00/odvp13: 0
/sys/bus/platform/devices/INT3400:00/odvp14: 0
/sys/bus/platform/devices/INT3400:00/odvp15: 0
/sys/bus/platform/devices/INT3400:00/odvp16: 0
/sys/bus/platform/devices/INT3400:00/odvp17: 0
/sys/bus/platform/devices/INT3400:00/odvp18: 0
/sys/bus/platform/devices/INT3400:00/odvp19: 0
/sys/bus/platform/devices/INT3400:00/odvp2: 0
/sys/bus/platform/devices/INT3400:00/odvp3: 0
/sys/bus/platform/devices/INT3400:00/odvp4: 0
/sys/bus/platform/devices/INT3400:00/odvp5: 0
/sys/bus/platform/devices/INT3400:00/odvp6: 0
/sys/bus/platform/devices/INT3400:00/odvp7: 0
/sys/bus/platform/devices/INT3400:00/odvp8: 0
/sys/bus/platform/devices/INT3400:00/odvp9: 0
estan@edison:~$
I had a go again, this time with Linux 5.8rc7 and the v2.3_development
branch from https://github.com/intel/thermal_daemon/, after @spandruvada merged the DPTF code.
This time it seems to not throttle as aggressively:
The ambient temperature this time was 23.5° C. Again, with laptop on desk.
It's still not allowed to run at full throttle (~30 W / 97° C) though.
Let me know if there are any logs/dumps that would help in supporting the Thinkpad X1C6.
Here is the output from thermald --adaptive --no-daemon --loglevel=debug
when running my test: thermald.log
The point at which stress -c $(nproc)
was started is roughly around the [1596186862][DEBUG]poll exit 0 polls_fd event 0 0
printout.
One can see that it starts out at ~29 W, then it's throttled down to ~23 W, then to ~19 W. It seems to then run stable at ~19 W.
@mrhpearson In your April post to the Lenovo thread you mentioned:
As pointed out by @notabenem above I have been keeping an eye on Matthew Garrett's reverse engineering of DPTF and helping out where I can. I want to make sure that implementation doesn't impact Lenovo platforms which have the improved thermal firmware but I'm also hoping to find a solution to improve things for those who have platforms that can't be updated. I'll continue to help out there however I can.
Since you're in contact with the Lenovo thermal team, is it possible for you to get some information from them that would help @mjg59 in making the DPTF support in thermald more complete for these "older" models, like my X1C6, where Lenovo is not going to release a workaround firmware? This would be a great way for Lenovo to help out here.
@mjg59 can comment much better on exactly which information is needed, but I believe it's things such as the meaning of the manufacturer-exposed thermal conditions?
In my latest tests above, @mjg59's patches showed some improvement on my X1C6: Instead of rapidly throttling down to ~15 W under load, which from the Lenovo thread I gather is the expected behavior for "on lap" mode, it throttles more conservatively ~30 W -> ~23 W -> ~19 W. It is not allowed to run at full power though (~30 W), so something in the support for this laptop is not complete. Hoping some more info from Lenovo could get it there. Tests were done with laptop on desk.
And like in https://github.com/mjg59/thermal_daemon/issues/6, I see no difference in grep -r . /sys/bus/platform/devices/INT3400:00/
output when going from desk to lap.
Maybe my expectations for on desk performance are off. When I reach the steady state, with the package limited to 19 W, the package temperature is 90° C and the cores are running at ~2600 MHz. I should probably get Windows on an USB stick to observe the behavior there as comparison. Maybe being throttled to 19 W / 90° C on desk is normal, @mrhpearson do you know?
With no thermald running, the laptop is quickly throttled to 15 W / 80° C, with cores running at ~2300 MHz. So a definite improvement for on desk operation with this patched thermald.
However, since none of the ODVP values change when I go to from desk to lap, I suspect thermald cannot detect the lap/desk state (correct me if I'm wrong @mjg59). And I wouldn't want the laptop to run this hot when on my lap. So additional info needed from Lenovo for this model is probably: How should lap/desk state be detected?
We don't have all conditions implemented. I will push another build today. Turn debug mode and check.
On Sat, 2020-08-01 at 04:36 -0700, Elvis Stansvik wrote:
Maybe my expectations for on-desk speed are off. When I reach the steady state, with the package limited to 19 W, the package temperature is 90° C and the cores are running at ~2600 MHz. I should probably get Windows on an USB stick to observe the behavior there as comparison. Maybe being throttled to 19 W / 90° C on desk is normal, @mrhpearsonhttps://github.com/mrhpearson do you know?
With no thermald running, the laptop is quickly throttled to 15 W / 80° C, with cores running at ~2300 MHz. So a definite improvement for on desk operation.
However, since none of the ODVP values change when I go to from desk to lap, I suspect thermald cannot detect the lap/desk state (correct me if I'm wrong @mjg59https://github.com/mjg59). And I wouldn't want the laptop to run this hot when on my lap. So additional info needed from Lenovo for this model is probably: How should lap/desk state be detected?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667518024, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNXDIBTYOW4BSKUGBP3R6P45TANCNFSM4OKE3TWQ.
Thanks for the update @spandruvada, that sounds promising!
What should I look for in the debug output? The debug printouts related to conditions, from the thermald --adaptive --no-daemon --loglevel=debug
output I attached to https://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667029516 are:
[1596187275][DEBUG]evaluate condition set 0
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4096
[1596187275][DEBUG]evaluate condition set 1
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4108
[1596187275][DEBUG]evaluate condition set 2
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4108
[1596187275][DEBUG]evaluate condition set 3
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4108
[1596187275][DEBUG]evaluate condition set 4
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4099
[1596187275][DEBUG]evaluate condition set 5
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4098
[1596187275][DEBUG]evaluate condition set 6
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4097
[1596187275][DEBUG]evaluate condition set 7
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4104
[1596187275][DEBUG]evaluate condition set 8
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4105
[1596187275][DEBUG]evaluate condition set 9
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4105
[1596187275][DEBUG]evaluate condition set 10
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4106
[1596187275][DEBUG]evaluate condition set 11
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4102
[1596187275][DEBUG]evaluate condition set 12
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4100
[1596187275][DEBUG]evaluate condition set 13
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 20
[1596187275][DEBUG]evaluate condition set 14
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 19
[1596187275][DEBUG]evaluate condition.condition at index 1
[1596187275][DEBUG]evaluate condition.condition 8
[1596187275][DEBUG]evaluate condition.condition at index 2
[1596187275][DEBUG]evaluate condition.condition at index 3
[1596187275][DEBUG]evaluate condition.condition at index 4
[1596187275][DEBUG]evaluate condition.condition at index 5
[1596187275][DEBUG]evaluate condition.condition at index 6
[1596187275][DEBUG]evaluate condition.condition at index 7
[1596187275][DEBUG]evaluate condition.condition at index 8
[1596187275][DEBUG]evaluate condition.condition at index 9
I took a walk to buy an USB stick because I needed a new one anyway. Let me know if you want me to boot Windows on it and see what the behavior under load on this model is under Windows with Lenovo DPTF driver.
What should I look for in the debug output?
Looking at the code, I think I should see some error printouts about unsupported conditions if there were unsupported conditions? I can't see any errors.
Before running thermald from command line:
$sudo touch /run/thermald/debug_mode
You will see dump of tables with codes when you run thermald after that.
Your log shows, which will be ignored.
"evaluate condition.condition 4096"
On Sat, 2020-08-01 at 09:13 -0700, Elvis Stansvik wrote:
What should I look for in the debug output?
Looking at the code, I think I should see some error printouts about unsupported conditions if there were unsupported conditions? I can't see any errors.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667553936, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNQ3GZBOAZTRD47FVS3R6Q5KTANCNFSM4OKE3TWQ.
@spandruvada Thanks, thermald_debug.log is the output from thermald --adaptive --no-daemon --loglevel=debug
when running in debug mode. Can something be worked out from this output?
Shortly after starting thermald
, I started stress -c $(nproc)
to stress the CPU:
The behavior is as before, first it's throttled to 23 W, then 19 W. During the last bit of the plot, I had canceled the stress
command.
Looking at the log, these conditions are listed as UNKNOWN in the APCT dump:
estan@edison:~$ grep UNKNOWN thermald_debug.log
[1596303620][INFO] target:10 device: condition:UNKNOWN( 4096 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:36 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:2 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:32 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:7 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:37 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:8 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:7 device: condition:UNKNOWN( 4099 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:8 device: condition:UNKNOWN( 4098 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:9 device: condition:UNKNOWN( 4097 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:23 device: condition:UNKNOWN( 4104 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:33 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:30 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:21 device: condition:UNKNOWN( 4106 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:15 device: condition:UNKNOWN( 4102 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:6 device: condition:UNKNOWN( 4100 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
estan@edison:~$
I think they are Lenovo specific ids. The first 12 conditions in the table, we are not matching as we don't know what they are.
On Sat, 2020-08-01 at 10:56 -0700, Elvis Stansvik wrote:
Looking at the log, these conditions are listed as UNKNOWN in the APCT dump:
estan@edison:~$ grep UNKNOWN thermald_debug.log
[1596303620][INFO] target:10 device: condition:UNKNOWN( 4096 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:36 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:2 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:32 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:7 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:37 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:8 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:7 device: condition:UNKNOWN( 4099 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:8 device: condition:UNKNOWN( 4098 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:9 device: condition:UNKNOWN( 4097 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:23 device: condition:UNKNOWN( 4104 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:33 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:30 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:21 device: condition:UNKNOWN( 4106 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:15 device: condition:UNKNOWN( 4102 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
[1596303620][INFO] target:6 device: condition:UNKNOWN( 4100 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0
estan@edison:~$
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667566425, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNUAWGMFDTIVASS4BXLR6RJMDANCNFSM4OKE3TWQ.
@spandruvada Alright, that makes sense.
@mrhpearson Could you talk to the Lenovo thermal team and see if it's possible to get some documentation on these manufacturer specific conditions in the conditions table, so that thermald adaptive mode can be made to work well on the X1C6? Also of interest would be to know how the OS is supposed to detect lap vs desk on the X1C6. In my tests, the ODVP values didn't change when I switched from desk to lap.
4096 should map to odvp6 in the data vault - see evaluate_oem_condition(). These are OEM specific (ie, we have no idea what they actually mean), but they should already be handled.
There are some lower power limit participants in the list. But some OEM specific variable should change to switch. So this system is looking for some more confirmation. @estan Are you monitoring actual sysfs at/sys/bus/platform/devices/INT3400:00/odvp*?
Thanks guys.
4096 should map to odvp6 in the data vault - see evaluate_oem_condition(). These are OEM specific (ie, we have no idea what they actually mean), but they should already be handled.
Aha, I see, so they are handled "blindly". All good then.
There are some lower power limit participants in the list. But some OEM specific variable should change to switch. So this system is looking for some more confirmation. @estan Are you monitoring actual sysfs at/sys/bus/platform/devices/INT3400:00/odvp*?
Thanks. The way I tested this just now was:
thermald
with thermald --adaptive --no-daemon --loglevel=debug
.estan@edison:~$ sudo grep -r . /sys/bus/platform/devices/INT3400:00/odvp* | sort
/sys/bus/platform/devices/INT3400:00/odvp0:2
/sys/bus/platform/devices/INT3400:00/odvp1:0
/sys/bus/platform/devices/INT3400:00/odvp10:0
/sys/bus/platform/devices/INT3400:00/odvp11:0
/sys/bus/platform/devices/INT3400:00/odvp12:0
/sys/bus/platform/devices/INT3400:00/odvp13:0
/sys/bus/platform/devices/INT3400:00/odvp14:0
/sys/bus/platform/devices/INT3400:00/odvp15:0
/sys/bus/platform/devices/INT3400:00/odvp16:0
/sys/bus/platform/devices/INT3400:00/odvp17:0
/sys/bus/platform/devices/INT3400:00/odvp18:0
/sys/bus/platform/devices/INT3400:00/odvp19:0
/sys/bus/platform/devices/INT3400:00/odvp2:0
/sys/bus/platform/devices/INT3400:00/odvp3:0
/sys/bus/platform/devices/INT3400:00/odvp4:0
/sys/bus/platform/devices/INT3400:00/odvp5:0
/sys/bus/platform/devices/INT3400:00/odvp6:0
/sys/bus/platform/devices/INT3400:00/odvp7:0
/sys/bus/platform/devices/INT3400:00/odvp8:0
/sys/bus/platform/devices/INT3400:00/odvp9:0
estan@edison:~$
estan@edison:~$ sudo grep -r . /sys/bus/platform/devices/INT3400:00/odvp* | sort
/sys/bus/platform/devices/INT3400:00/odvp0:2
/sys/bus/platform/devices/INT3400:00/odvp1:0
/sys/bus/platform/devices/INT3400:00/odvp10:0
/sys/bus/platform/devices/INT3400:00/odvp11:0
/sys/bus/platform/devices/INT3400:00/odvp12:0
/sys/bus/platform/devices/INT3400:00/odvp13:0
/sys/bus/platform/devices/INT3400:00/odvp14:0
/sys/bus/platform/devices/INT3400:00/odvp15:0
/sys/bus/platform/devices/INT3400:00/odvp16:0
/sys/bus/platform/devices/INT3400:00/odvp17:0
/sys/bus/platform/devices/INT3400:00/odvp18:0
/sys/bus/platform/devices/INT3400:00/odvp19:0
/sys/bus/platform/devices/INT3400:00/odvp2:0
/sys/bus/platform/devices/INT3400:00/odvp3:0
/sys/bus/platform/devices/INT3400:00/odvp4:0
/sys/bus/platform/devices/INT3400:00/odvp5:0
/sys/bus/platform/devices/INT3400:00/odvp6:0
/sys/bus/platform/devices/INT3400:00/odvp7:0
/sys/bus/platform/devices/INT3400:00/odvp8:0
/sys/bus/platform/devices/INT3400:00/odvp9:0
estan@edison:~$
I.e. no change in values.
All was done with AC plugged in.
Am I right in that I should see some change in the ODVP values when switching from desk to lap?
Yeah, I'd expect lap detection to be exposed via one of the ODVP values. If that's not changing then I think we're doing something wrong.
Thanks for confirming @mjg59.
Actually, @mrhpearson could you confirm that this laptop even has lap detection? I know you mentioned somewhere in the big Lenovo thread that some models do not have it. But I would assume that the X1C6 has it. If so, how is the OS supposed to query/be notified of lap/desk state changes?
As I was curious, I also tried monitoring the OEM conditions every 2 seconds with sudo watch grep -r . /sys/bus/platform/devices/INT3400:00/odvp*
while running a stress -c $(nproc)
stress test, and AFAICS they did not change at any point during the run:
This time I had the two BIOS options Config → Power → Adaptive Thermal Management → Scheme for AC / Scheme for Battery both set to Balanced (as opposed to Maximize Performance). The default was to have Scheme for AC set to Maximize Performance and Scheme for Battery set to Balanced. But I wanted to have a go with them both set to Balanced to see if it made any difference in how ODVP values changed. Changing back to defaults now.
FWIW, I created a Windows bootable USB stick and found the laptop to be overly throttled while on desk/AC also under Windows, with all Lenovo official software installed, including their own DPTF driver. See my post X1 Carbon 6th Throttled to ~15 W/~80° C Under Load While on Desk/65 W AC on the Lenovo Forums about this. If Lenovo cannot come up with a solution for this even under Windows, I may return the laptop asking for a refund, since it's still under warranty.
Would of course very much like to get it to work fully under Linux, since that's what I use. But I'm starting to suspect that the laptop simply cannot tell if it's on desk/lap, and perhaps that's why Lenovo has chosen not to make a firmware workaround for this model (like they are doing for newer models). Maybe they cannot meet regulatory safety requirements without that ability. All speculation from me of course. Would love a straight answer from Lenovo on why this model did not get the firmware workaround.
Yeah, I'd expect lap detection to be exposed via one of the ODVP values. If that's not changing then I think we're doing something wrong.
@mjg59 Somewhat related, by again looking through the long Lenovo forum thread, I found this note from @mrhpearson @ Lenovo:
We are working on getting support into the kernel to make this feature much more usable. The first patch to give a sysfs node for lap/desk mode is under review right now (https://sourceforge.net/p/ibm-acpi/mailman/message/37028010/)
So they are adding a way to query lap/desk state via sysfs. But I guess that's not of much use to thermald
and the question still remains why none of the ODVP values change when switching between desk and lap.
So they are adding a way to query lap/desk state via sysfs. But I guess that's not of much use to
thermald
and the question still remains why none of the ODVP values change when switching between desk and lap.
...and also, I realize now that this sysfs group will probably not be visible unless you have the updated firmware with thermal management from Lenovo, which was not released for my model.
I guess I should try out v5 of @mrhpearson's lapmode kernel patch (https://sourceforge.net/p/ibm-acpi/mailman/message/37052077/) and see if the sysfs node shows up, and if it reports lapmode when I switch between desk and lap. If it does, then at least I'd know that lapmode detection is supported by the X1C6.
When we sysfs path for lap/desk mode, thermald will exit. That is what requirement from Lenovo as firmware will take care.
On Sun, 2020-08-02 at 04:15 -0700, Elvis Stansvik wrote:
Yeah, I'd expect lap detection to be exposed via one of the ODVP values. If that's not changing then I think we're doing something wrong.
@mjg59https://github.com/mjg59 Somewhat related, by again looking through the long Lenovo forum thread, I found this notehttps://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/m-p/4028489?page=40#5069974 from @mrhpearsonhttps://github.com/mrhpearson @ Lenovo:
We are working on getting support into the kernel to make this feature much more usable. The first patch to give a sysfs node for lap/desk mode is under review right now (https://sourceforge.net/p/ibm-acpi/mailman/message/37028010/)
So they are adding a way to query lap/desk state via sysfs. But I guess that's not of much use to thermald and the question still remains why none of the ODVP values change when switching between desk and lap.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667660717, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNUBCGPBL7PCJJ5DI53R6VDFHANCNFSM4OKE3TWQ.
When we sysfs path for lap/desk mode, thermald will exit. That is what requirement from Lenovo as firmware will take care.
@spandruvada Hm yes, I saw now that's how it'll work (https://github.com/intel/thermal_daemon/commit/4b7c0f20fcb3813f03ec6179075d8799151f3e65). I'm not sure it was a good idea to make it assume "has lapmode sysfs path" => "thermals handled by firmware". It would perhaps have been better if Lenovo had exposed an explicit sysfs path for "handles thermal management", because the lapmode sysfs path could be useful, even if one didn't get the firmware upgrade from Lenovo (like my X1C6). For example, I want to know now if my X1C6 supports detecting lap mode (I know that the firmware does not do thermal management).
I'm assuming here that there may exist such laptops, which a) has support for lap detection, b) did not get the thermal-managing firmware update from Lenovo, and that my X1C6 might be one of them. I don't know if that's correct though. I think @mrhpearson will know.
Agree. The name of sysfs node here is not correct.
On Sun, 2020-08-02 at 12:00 -0700, Elvis Stansvik wrote:
When we sysfs path for lap/desk mode, thermald will exit. That is what requirement from Lenovo as firmware will take care.
@spandruvadahttps://github.com/spandruvada Hm yes, I saw now that's how it'll work (intel@4b7c0f2https://github.com/intel/thermal_daemon/commit/4b7c0f20fcb3813f03ec6179075d8799151f3e65). I'm not sure it was a good idea to make it assume "has lapmode sysfs path" => "thermals handled by firmware". It would perhaps have been better if Lenovo had exposed an explicit sysfs path for "handles thermal management", because the lapmode sysfs path could be useful, even if one didn't get the firmware upgrade from Lenovo (like my X1C6). For example, I want to know now if my X1C6 supports it.
I'm assuming here that there may exist such laptops, which a) has support for lap detection, b) did not get the thermal-managing firmware update from Lenovo, and that my X1C6 might be one of them. I don't know if that's correct though. I think @mrhpearsonhttps://github.com/mrhpearson will know.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667711690, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNTIRI6A6BWGLPEONC3R6WZTNANCNFSM4OKE3TWQ.
Agree. The name of sysfs node here is not correct.
I built my own kernel with @mrhpearson's patch applied (https://patchwork.kernel.org/patch/11640537/), and the /sys/devices/platform/thinkpad_acpi/dytc_lapmode
path does appear, but it's reporting 0 all the time (with laptop on lap and on desk).
So it's not just the name that is incorrect, it's the semantics. I think Lenovo will have to create another sysfs path which indicates exactly if thermal management is done by the firmware or not (it is not on the X1C6).
In this case, I was just interested to see if lap mode was detected correctly, but since it's reporting 0 all the time, something seems wrong. I will post a reply to the patch submission on the ibm-acpi-devel mailing list about this. EDIT: My reply is now here: https://sourceforge.net/p/ibm-acpi/mailman/message/37076074/
Apologies for the slow reply- I was on PTO with no internet access (best way to enjoy PTO ;)). Wanted to ack that I have read this thread. I don't have an X1C6 myself but I'll follow up with the firmware team and see if we're able to get some answers to some of the questions above. As a minor aside I'm also looking into why the dytc_lapmode shows up on the X1C6 if it's not working or supported. That's not what is supposed to happen....
Apologies for the slow reply- I was on PTO with no internet access (best way to enjoy PTO ;)). Wanted to ack that I have read this thread.
No worries at all, I'm on PTO myself at the moment :)
I don't have an X1C6 myself but I'll follow up with the firmware team and see if we're able to get some answers to some of the questions above.
Thanks a lot. And sorry for coming at you from different angles here (kernel MLs, GitHub, Lenovo forum).
As a minor aside I'm also looking into why the dytc_lapmode shows up on the X1C6 if it's not working or supported. That's not what is supposed to happen....
Yep. Additionally, it would be good with a dedicated sysfs path that explictly says "this firmware does thermal management", so that thermald
can use that one to know when to stay off, instead of piggybacking the dytc_lapmode
.
My dream scenario would of course be that
thermald
can then evaluate.Otherwise there's no hope of having thermald --adaptive
working for the X1C6 I think. Or at least, it won't be adapting to lap mode, which I think is what is wanted to get the best behavior, in lieu of a thermal managing firmware update from Lenovo.
Minor update - I confirmed X1C6 doesn't have the same laptop support as is used for the X1C7/8 does using DYTC. I need to submit a fix to the patch so the dytc_lapmode sysfs entry doesn't show up there.
Still trying to get details on how/if X1C6 reacts to lap mode - I'll update when I find out more.
Yep. Additionally, it would be good with a dedicated sysfs path that explictly says "this firmware does thermal management", so that
thermald
can use that one to know when to stay off, instead of piggybacking thedytc_lapmode
.
I do have a patch for the thermal/performance management that I'm working on. It was posted to the mailing lists for initial review and has some feedback I need to address. Once that is available it should help with knowing if firmware thermal management is available or not - better than dytc_lapmode anyway (and hopefully provide better controls to Linux users to determine the current status and set different performance modes).
Thanks for the update @mrhpearson
Yes, using dytc_lapmode
for the kill-switch is not optimal. But it is the best we can do for now. thinkpad_acpi
is getting a dytc_perfmode
soon, and once that is widely available we should be switching the detection code.
Also, I am pretty sure the OEM conditions are never updated on Linux on these machines (i.e. the odvp*
sysfs values). I guess that might either mean that we are doing something wrong or that the firmware detects DPTF not being used and simply never updates the values.
It is possible that OEM variables are changing but no ACPI notification for change. Current driver assumes that notification will be sent. To test, try rmmod int3400_thermal and modprobe again to force the read again.
Tried this on a X1C7 right now, there all values are stuck at 0. I tried (reloading the module each time):
dytc_lapmode
to toggledytc_perfmode
using Fn+HML (didn't expect anything)Always reloading the module, and none of these triggered a change.
Looking at the ACPI code, on this machine (X1C7), the ODVX
values are read from another OMVE
array. This array in turn is updated by the DYTC
method, which checks DPTE
(DPTF
Enable?). DPTE
can be set via _OSC
on the INT3400 object (\_SB.IETM._OSC
):
Method (_OSC, 4, Serialized) // _OSC: Operating System Capabilities
{
[SNIP]
CreateDWordField (Arg3, 0x04, CAP1)
If ((CAP1 & One))
{
// Set DPTE to 1 ...
So, looks like we need to tell the firmware that we support DPTF. I guess we need to expose this (and possibly other) capabilities as a userspace modifiable value.
EDIT: I half suspect that getting this right will also disable the in-firmware thermal management on these machines.
Ugh, duh. If I set a UUID in /sys/bus/platform/devices/INT3400:00/uuids/current_uuid
and then also enable the thermal device by setting /sys/class/thermal/thermal_zone1/mode
then it starts working. And I can confirm for example that odvp1
is the dytc_lapmode
.
I guess not actually running thermald made me run into that issue :-/
Thermald with adaptive will do these steps. Add some prints to takeover_thermal_control().
Yep, I realise that now :)
But, most of these OEM conditions seems to be indirectly set by the OS (through the DYTC method that the thinkpad_acpi
driver users). So looking at the various DPTF conditions, I suspect that none except for the last one can ever be matched.
The good news is that thermald and FW don't seem to actually fight over thermal management. I think the firmware code does disable itself (but the FW code works a lot better …)
It sounds like you're on to something @benzea. Let me know if there's something you want me to try on my X1C6. Willing to run latest mainline kernel build, build thermald with debug prints, et.c. Whatever it takes to know if thermald --adaptive
can be made to work in cooperation with firmware-reported lap mode on this machine! :)
Would be so sweet to be able to max it out at ~97 degrees C while on the desk at work, and have it chill out a little automatically while on my lap.
Don't hold your breath, the amount of required work in various places until that can begin to work is ugly.
EDIT: It isn't really that hard, but it requires adding sysfs attributes to set the ACPI variables that feed the OEM flags (pretty sure we got stuff like "docked" and performance slider position there). These are thinkpad_acpi patches but reverse engineering may be required (unless Lenovo provides more specification about the DYTC method).
Once we have those, we can write thinkpad specific userspace code to correctly set them. You know, detect whether a TB/USB dock is attached and then set the appropriate flags and such.
And, once we do that, we probably start getting the right profiles selected by the adaptive code.
But for now … it is all stuck at the last profile because the default values for some of the OEM variables (i.e. the odvp stuff) are outside of the valid range. Just having good defaults in the ACPI could help a lot …
Alright @benzea, I understand. Just let me know if there's anything I can do to help.
Regarding
EDIT: It isn't really that hard, but it requires adding sysfs attributes to set the ACPI variables that feed the OEM flags (pretty sure we got stuff like "docked" and performance slider position there). These are thinkpad_acpi patches but reverse engineering may be required (unless Lenovo provides more specification about the DYTC method).
perhaps @mrhpearson from Lenovo can help out? In the big thread on the Lenovo forums, when it was finally announced that only some models will get the thermal-managing firmware upgrade, users of slightly older models (like my X1C6) were told that Lenovo was looking on with interest at the work done by @mjg59 to get adaptive mode DPTF support into thermald. Maybe Lenovo could provide some info to reduce the amount of reverse engineering required to figure out how things should be set up.
perhaps @mrhpearson from Lenovo can help out? In the big thread on the Lenovo forums, when it was finally announced that only some models will get the thermal-managing firmware upgrade, users of slightly older models (like my X1C6) were told that Lenovo was looking on with interest at the work done by @mjg59 to get adaptive mode DPTF support into thermald. Maybe Lenovo could provide some info to reduce the amount of reverse engineering required to figure out how things should be set up.
@estan - I'm trying but it's slow going. The firmware team are very hesitant to provide any information that might also be related to or involved with pieces that are under NDA with Intel. I realise that's a crappy answer :( I can definitely help with any testing for the platforms we have in our team - and if I can figure out pieces directly myself I will but so far I'm really limited in what information I have myself.
@mrhpearson No worries, we've all suffered from bureaucracy. You're a champion for Linux on Thinkpads doing a great job with what you've got. And patiently so, judging by the recent activity on the platform-driver-x86 list :)
EDIT: It isn't really that hard, but it requires adding sysfs attributes to set the ACPI variables that feed the OEM flags (pretty sure we got stuff like "docked" and performance slider position there). These are thinkpad_acpi patches but reverse engineering may be required (unless Lenovo provides more specification about the DYTC method).
perhaps @mrhpearson from Lenovo can help out? In the big thread on the Lenovo forums, when it was finally announced that only some models will get the thermal-managing firmware upgrade, users of slightly older models (like my X1C6) were told that Lenovo was looking on with interest at the work done by @mjg59 to get adaptive mode DPTF support into thermald. Maybe Lenovo could provide some info to reduce the amount of reverse engineering required to figure out how things should be set up.
@estan - I'm trying but it's slow going. The firmware team are very hesitant to provide any information that might also be related to or involved with pieces that are under NDA with Intel. I realise that's a crappy answer :( I can definitely help with any testing for the platforms we have in our team - and if I can figure out pieces directly myself I will but so far I'm really limited in what information I have myself.
@mrhpearson I'm guessing it wasn't possible to pull any further info on this out of the firmware team (?)
I'm fine with closing this issue, and besides it's a little misplaced here on @mjg59's fork now that thermald adaptive mode is long since upstreamed. When it comes to the throttling issue, it pretty much resolves itself if I run a current thermald. The laptop when stressed can then run CPUs at approx 90°C @ 20 W package power draw, which is significantly better than what it used to be when I initially reported this. Regarding the adaptive part, I think I just have to accept that this laptop either doesn't have lap sensing capabilities, or they are not working correctly. This agrees with what the current kernel does (does not register lap_mode sysfs if DYTC version is < 5).
I'm still using this X1C6 as my main work laptop, and I guess I could hack something together so that when I dock at work, it starts running thermald, and stops when I undock. Since at work I'm 90% of the time docked at my desk, and at home 90% on the couch, it will be a fairly good approximation of lap sensing :)
@mrhpearson BTW great talk at DebConf 22 :+1: (Lenovo: Give this man a raise!).
@mjg59 Originally posted on the upstream PR (https://github.com/intel/thermal_daemon/pull/224#issuecomment-650599289) but thought maybe it's better to post here (?)
System: Ubuntu 20.04 Kernel: Mainline 5.8rc2 Thermald: From the
upstream
branch in this repo (git describe:v1.4.2-415-g05493d3
)The CPU looks to be throttled to ~15 W / ~75° C under load (
stress -c 8
). This was with laptop on desk at ~28.5° C ambient temperature.Here is the output from
acpidump
as root: acpidump.txtMaybe my issue is similar to https://github.com/mjg59/thermal_daemon/issues/6 ?