mjg59 / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
47 stars 2 forks source link

X1 Carbon 6th results + ACPI dump #7

Closed estan closed 2 years ago

estan commented 4 years ago

@mjg59 Originally posted on the upstream PR (https://github.com/intel/thermal_daemon/pull/224#issuecomment-650599289) but thought maybe it's better to post here (?)

System: Ubuntu 20.04 Kernel: Mainline 5.8rc2 Thermald: From the upstream branch in this repo (git describe: v1.4.2-415-g05493d3)

85929359-5fda7d80-b8b4-11ea-9e94-a78c1b817d5c

The CPU looks to be throttled to ~15 W / ~75° C under load (stress -c 8). This was with laptop on desk at ~28.5° C ambient temperature.

Here is the output from acpidump as root: acpidump.txt

Maybe my issue is similar to https://github.com/mjg59/thermal_daemon/issues/6 ?

estan commented 4 years ago

I also checked the output of grep -r . /sys/bus/platform/devices/INT3400:00/ after

1) About a minute in my lap 2) About a minute on my desk

and there was no difference in output.

estan commented 4 years ago

My ACPI dump looks quite different from @mrhpearson's. I don't think there's two DPTF tables in mine (?). I'm pretty sure I run an older Lenovo firmware though, since they did not release the newest one for all models.

estan commented 4 years ago

For reference, here's the output of fwupdmgr get-devices, showing Lenovo firmware version: fwupdmgr-devices.txt

estan commented 4 years ago

In case it helps, here's output from cat /sys/bus/platform/devices/INT3400:00/data_vault: data_vault.txt (.txt suffix because GitHub won't accept it otherwise).

And also:

estan@edison:~$ for f in /sys/bus/platform/devices/INT3400\:00/odvp*; do echo "$f: $(< $f)"; done
/sys/bus/platform/devices/INT3400:00/odvp0: 2
/sys/bus/platform/devices/INT3400:00/odvp1: 0
/sys/bus/platform/devices/INT3400:00/odvp10: 0
/sys/bus/platform/devices/INT3400:00/odvp11: 0
/sys/bus/platform/devices/INT3400:00/odvp12: 0
/sys/bus/platform/devices/INT3400:00/odvp13: 0
/sys/bus/platform/devices/INT3400:00/odvp14: 0
/sys/bus/platform/devices/INT3400:00/odvp15: 0
/sys/bus/platform/devices/INT3400:00/odvp16: 0
/sys/bus/platform/devices/INT3400:00/odvp17: 0
/sys/bus/platform/devices/INT3400:00/odvp18: 0
/sys/bus/platform/devices/INT3400:00/odvp19: 0
/sys/bus/platform/devices/INT3400:00/odvp2: 0
/sys/bus/platform/devices/INT3400:00/odvp3: 0
/sys/bus/platform/devices/INT3400:00/odvp4: 0
/sys/bus/platform/devices/INT3400:00/odvp5: 0
/sys/bus/platform/devices/INT3400:00/odvp6: 0
/sys/bus/platform/devices/INT3400:00/odvp7: 0
/sys/bus/platform/devices/INT3400:00/odvp8: 0
/sys/bus/platform/devices/INT3400:00/odvp9: 0
estan@edison:~$
estan commented 4 years ago

I had a go again, this time with Linux 5.8rc7 and the v2.3_development branch from https://github.com/intel/thermal_daemon/, after @spandruvada merged the DPTF code.

This time it seems to not throttle as aggressively:

bild

The ambient temperature this time was 23.5° C. Again, with laptop on desk.

It's still not allowed to run at full throttle (~30 W / 97° C) though.

Let me know if there are any logs/dumps that would help in supporting the Thinkpad X1C6.

estan commented 4 years ago

Here is the output from thermald --adaptive --no-daemon --loglevel=debug when running my test: thermald.log

The point at which stress -c $(nproc) was started is roughly around the [1596186862][DEBUG]poll exit 0 polls_fd event 0 0 printout.

One can see that it starts out at ~29 W, then it's throttled down to ~23 W, then to ~19 W. It seems to then run stable at ~19 W.

estan commented 4 years ago

@mrhpearson In your April post to the Lenovo thread you mentioned:

As pointed out by @notabenem above I have been keeping an eye on Matthew Garrett's reverse engineering of DPTF and helping out where I can. I want to make sure that implementation doesn't impact Lenovo platforms which have the improved thermal firmware but I'm also hoping to find a solution to improve things for those who have platforms that can't be updated. I'll continue to help out there however I can.

Since you're in contact with the Lenovo thermal team, is it possible for you to get some information from them that would help @mjg59 in making the DPTF support in thermald more complete for these "older" models, like my X1C6, where Lenovo is not going to release a workaround firmware? This would be a great way for Lenovo to help out here.

@mjg59 can comment much better on exactly which information is needed, but I believe it's things such as the meaning of the manufacturer-exposed thermal conditions?

In my latest tests above, @mjg59's patches showed some improvement on my X1C6: Instead of rapidly throttling down to ~15 W under load, which from the Lenovo thread I gather is the expected behavior for "on lap" mode, it throttles more conservatively ~30 W -> ~23 W -> ~19 W. It is not allowed to run at full power though (~30 W), so something in the support for this laptop is not complete. Hoping some more info from Lenovo could get it there. Tests were done with laptop on desk.

estan commented 4 years ago

And like in https://github.com/mjg59/thermal_daemon/issues/6, I see no difference in grep -r . /sys/bus/platform/devices/INT3400:00/ output when going from desk to lap.

estan commented 4 years ago

Maybe my expectations for on desk performance are off. When I reach the steady state, with the package limited to 19 W, the package temperature is 90° C and the cores are running at ~2600 MHz. I should probably get Windows on an USB stick to observe the behavior there as comparison. Maybe being throttled to 19 W / 90° C on desk is normal, @mrhpearson do you know?

With no thermald running, the laptop is quickly throttled to 15 W / 80° C, with cores running at ~2300 MHz. So a definite improvement for on desk operation with this patched thermald.

However, since none of the ODVP values change when I go to from desk to lap, I suspect thermald cannot detect the lap/desk state (correct me if I'm wrong @mjg59). And I wouldn't want the laptop to run this hot when on my lap. So additional info needed from Lenovo for this model is probably: How should lap/desk state be detected?

spandruvada commented 4 years ago

We don't have all conditions implemented. I will push another build today. Turn debug mode and check.

On Sat, 2020-08-01 at 04:36 -0700, Elvis Stansvik wrote:

Maybe my expectations for on-desk speed are off. When I reach the steady state, with the package limited to 19 W, the package temperature is 90° C and the cores are running at ~2600 MHz. I should probably get Windows on an USB stick to observe the behavior there as comparison. Maybe being throttled to 19 W / 90° C on desk is normal, @mrhpearsonhttps://github.com/mrhpearson do you know?

With no thermald running, the laptop is quickly throttled to 15 W / 80° C, with cores running at ~2300 MHz. So a definite improvement for on desk operation.

However, since none of the ODVP values change when I go to from desk to lap, I suspect thermald cannot detect the lap/desk state (correct me if I'm wrong @mjg59https://github.com/mjg59). And I wouldn't want the laptop to run this hot when on my lap. So additional info needed from Lenovo for this model is probably: How should lap/desk state be detected?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667518024, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNXDIBTYOW4BSKUGBP3R6P45TANCNFSM4OKE3TWQ.

estan commented 4 years ago

Thanks for the update @spandruvada, that sounds promising!

What should I look for in the debug output? The debug printouts related to conditions, from the thermald --adaptive --no-daemon --loglevel=debug output I attached to https://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667029516 are:

[1596187275][DEBUG]evaluate condition set 0
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4096
[1596187275][DEBUG]evaluate condition set 1
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4108
[1596187275][DEBUG]evaluate condition set 2
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4108
[1596187275][DEBUG]evaluate condition set 3
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4108
[1596187275][DEBUG]evaluate condition set 4
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4099
[1596187275][DEBUG]evaluate condition set 5
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4098
[1596187275][DEBUG]evaluate condition set 6
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4097
[1596187275][DEBUG]evaluate condition set 7
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4104
[1596187275][DEBUG]evaluate condition set 8
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4105
[1596187275][DEBUG]evaluate condition set 9
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4105
[1596187275][DEBUG]evaluate condition set 10
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4106
[1596187275][DEBUG]evaluate condition set 11
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4102
[1596187275][DEBUG]evaluate condition set 12
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 4100
[1596187275][DEBUG]evaluate condition set 13
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 20
[1596187275][DEBUG]evaluate condition set 14
[1596187275][DEBUG]evaluate condition.condition at index 0
[1596187275][DEBUG]evaluate condition.condition 19
[1596187275][DEBUG]evaluate condition.condition at index 1
[1596187275][DEBUG]evaluate condition.condition 8
[1596187275][DEBUG]evaluate condition.condition at index 2
[1596187275][DEBUG]evaluate condition.condition at index 3
[1596187275][DEBUG]evaluate condition.condition at index 4
[1596187275][DEBUG]evaluate condition.condition at index 5
[1596187275][DEBUG]evaluate condition.condition at index 6
[1596187275][DEBUG]evaluate condition.condition at index 7
[1596187275][DEBUG]evaluate condition.condition at index 8
[1596187275][DEBUG]evaluate condition.condition at index 9
estan commented 4 years ago

I took a walk to buy an USB stick because I needed a new one anyway. Let me know if you want me to boot Windows on it and see what the behavior under load on this model is under Windows with Lenovo DPTF driver.

estan commented 4 years ago

What should I look for in the debug output?

Looking at the code, I think I should see some error printouts about unsupported conditions if there were unsupported conditions? I can't see any errors.

spandruvada commented 4 years ago

Before running thermald from command line:

$sudo touch /run/thermald/debug_mode

You will see dump of tables with codes when you run thermald after that.

Your log shows, which will be ignored.

"evaluate condition.condition 4096"

On Sat, 2020-08-01 at 09:13 -0700, Elvis Stansvik wrote:

What should I look for in the debug output?

Looking at the code, I think I should see some error printouts about unsupported conditions if there were unsupported conditions? I can't see any errors.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667553936, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNQ3GZBOAZTRD47FVS3R6Q5KTANCNFSM4OKE3TWQ.

estan commented 4 years ago

@spandruvada Thanks, thermald_debug.log is the output from thermald --adaptive --no-daemon --loglevel=debug when running in debug mode. Can something be worked out from this output?

Shortly after starting thermald, I started stress -c $(nproc) to stress the CPU:

bild

The behavior is as before, first it's throttled to 23 W, then 19 W. During the last bit of the plot, I had canceled the stress command.

estan commented 4 years ago

Looking at the log, these conditions are listed as UNKNOWN in the APCT dump:

estan@edison:~$ grep UNKNOWN thermald_debug.log 
[1596303620][INFO]      target:10 device: condition:UNKNOWN( 4096 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:36 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:2 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:32 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:7 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:37 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:8 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:7 device: condition:UNKNOWN( 4099 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:8 device: condition:UNKNOWN( 4098 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:9 device: condition:UNKNOWN( 4097 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:23 device: condition:UNKNOWN( 4104 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:33 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:30 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:21 device: condition:UNKNOWN( 4106 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:15 device: condition:UNKNOWN( 4102 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
[1596303620][INFO]      target:6 device: condition:UNKNOWN( 4100 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0 
estan@edison:~$
spandruvada commented 4 years ago

I think they are Lenovo specific ids. The first 12 conditions in the table, we are not matching as we don't know what they are.

On Sat, 2020-08-01 at 10:56 -0700, Elvis Stansvik wrote:

Looking at the log, these conditions are listed as UNKNOWN in the APCT dump:

estan@edison:~$ grep UNKNOWN thermald_debug.log

[1596303620][INFO] target:10 device: condition:UNKNOWN( 4096 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:36 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:2 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:32 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:7 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:37 device: condition:UNKNOWN( 4108 ) comparison:ADAPTIVE_EQUAL argument:8 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:7 device: condition:UNKNOWN( 4099 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:8 device: condition:UNKNOWN( 4098 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:9 device: condition:UNKNOWN( 4097 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:23 device: condition:UNKNOWN( 4104 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:33 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:30 device: condition:UNKNOWN( 4105 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:21 device: condition:UNKNOWN( 4106 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:15 device: condition:UNKNOWN( 4102 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

[1596303620][INFO] target:6 device: condition:UNKNOWN( 4100 ) comparison:ADAPTIVE_EQUAL argument:1 operation:AND time_comparison:0 time:0 stare:0 state_entry_time:0

estan@edison:~$

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667566425, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNUAWGMFDTIVASS4BXLR6RJMDANCNFSM4OKE3TWQ.

estan commented 4 years ago

@spandruvada Alright, that makes sense.

@mrhpearson Could you talk to the Lenovo thermal team and see if it's possible to get some documentation on these manufacturer specific conditions in the conditions table, so that thermald adaptive mode can be made to work well on the X1C6? Also of interest would be to know how the OS is supposed to detect lap vs desk on the X1C6. In my tests, the ODVP values didn't change when I switched from desk to lap.

mjg59 commented 4 years ago

4096 should map to odvp6 in the data vault - see evaluate_oem_condition(). These are OEM specific (ie, we have no idea what they actually mean), but they should already be handled.

spandruvada commented 4 years ago

There are some lower power limit participants in the list. But some OEM specific variable should change to switch. So this system is looking for some more confirmation. @estan Are you monitoring actual sysfs at/sys/bus/platform/devices/INT3400:00/odvp*?

estan commented 4 years ago

Thanks guys.

4096 should map to odvp6 in the data vault - see evaluate_oem_condition(). These are OEM specific (ie, we have no idea what they actually mean), but they should already be handled.

Aha, I see, so they are handled "blindly". All good then.

There are some lower power limit participants in the list. But some OEM specific variable should change to switch. So this system is looking for some more confirmation. @estan Are you monitoring actual sysfs at/sys/bus/platform/devices/INT3400:00/odvp*?

Thanks. The way I tested this just now was:

  1. With laptop on desk (has been on desk for hours).
  2. Start thermald with thermald --adaptive --no-daemon --loglevel=debug.
  3. Check ODVP values with:
    estan@edison:~$ sudo grep -r . /sys/bus/platform/devices/INT3400:00/odvp* | sort
    /sys/bus/platform/devices/INT3400:00/odvp0:2
    /sys/bus/platform/devices/INT3400:00/odvp1:0
    /sys/bus/platform/devices/INT3400:00/odvp10:0
    /sys/bus/platform/devices/INT3400:00/odvp11:0
    /sys/bus/platform/devices/INT3400:00/odvp12:0
    /sys/bus/platform/devices/INT3400:00/odvp13:0
    /sys/bus/platform/devices/INT3400:00/odvp14:0
    /sys/bus/platform/devices/INT3400:00/odvp15:0
    /sys/bus/platform/devices/INT3400:00/odvp16:0
    /sys/bus/platform/devices/INT3400:00/odvp17:0
    /sys/bus/platform/devices/INT3400:00/odvp18:0
    /sys/bus/platform/devices/INT3400:00/odvp19:0
    /sys/bus/platform/devices/INT3400:00/odvp2:0
    /sys/bus/platform/devices/INT3400:00/odvp3:0
    /sys/bus/platform/devices/INT3400:00/odvp4:0
    /sys/bus/platform/devices/INT3400:00/odvp5:0
    /sys/bus/platform/devices/INT3400:00/odvp6:0
    /sys/bus/platform/devices/INT3400:00/odvp7:0
    /sys/bus/platform/devices/INT3400:00/odvp8:0
    /sys/bus/platform/devices/INT3400:00/odvp9:0
    estan@edison:~$
  4. Put laptop in lap. Sit like that for 5 minutes, making sure it's not completely still/horizontal.
  5. Check ODVP values again:
    estan@edison:~$ sudo grep -r . /sys/bus/platform/devices/INT3400:00/odvp* | sort
    /sys/bus/platform/devices/INT3400:00/odvp0:2
    /sys/bus/platform/devices/INT3400:00/odvp1:0
    /sys/bus/platform/devices/INT3400:00/odvp10:0
    /sys/bus/platform/devices/INT3400:00/odvp11:0
    /sys/bus/platform/devices/INT3400:00/odvp12:0
    /sys/bus/platform/devices/INT3400:00/odvp13:0
    /sys/bus/platform/devices/INT3400:00/odvp14:0
    /sys/bus/platform/devices/INT3400:00/odvp15:0
    /sys/bus/platform/devices/INT3400:00/odvp16:0
    /sys/bus/platform/devices/INT3400:00/odvp17:0
    /sys/bus/platform/devices/INT3400:00/odvp18:0
    /sys/bus/platform/devices/INT3400:00/odvp19:0
    /sys/bus/platform/devices/INT3400:00/odvp2:0
    /sys/bus/platform/devices/INT3400:00/odvp3:0
    /sys/bus/platform/devices/INT3400:00/odvp4:0
    /sys/bus/platform/devices/INT3400:00/odvp5:0
    /sys/bus/platform/devices/INT3400:00/odvp6:0
    /sys/bus/platform/devices/INT3400:00/odvp7:0
    /sys/bus/platform/devices/INT3400:00/odvp8:0
    /sys/bus/platform/devices/INT3400:00/odvp9:0
    estan@edison:~$

    I.e. no change in values.

All was done with AC plugged in.

Am I right in that I should see some change in the ODVP values when switching from desk to lap?

mjg59 commented 4 years ago

Yeah, I'd expect lap detection to be exposed via one of the ODVP values. If that's not changing then I think we're doing something wrong.

estan commented 4 years ago

Thanks for confirming @mjg59.

Actually, @mrhpearson could you confirm that this laptop even has lap detection? I know you mentioned somewhere in the big Lenovo thread that some models do not have it. But I would assume that the X1C6 has it. If so, how is the OS supposed to query/be notified of lap/desk state changes?

estan commented 4 years ago

As I was curious, I also tried monitoring the OEM conditions every 2 seconds with sudo watch grep -r . /sys/bus/platform/devices/INT3400:00/odvp* while running a stress -c $(nproc) stress test, and AFAICS they did not change at any point during the run:

bild

This time I had the two BIOS options Config → Power → Adaptive Thermal Management → Scheme for AC / Scheme for Battery both set to Balanced (as opposed to Maximize Performance). The default was to have Scheme for AC set to Maximize Performance and Scheme for Battery set to Balanced. But I wanted to have a go with them both set to Balanced to see if it made any difference in how ODVP values changed. Changing back to defaults now.

estan commented 4 years ago

FWIW, I created a Windows bootable USB stick and found the laptop to be overly throttled while on desk/AC also under Windows, with all Lenovo official software installed, including their own DPTF driver. See my post X1 Carbon 6th Throttled to ~15 W/~80° C Under Load While on Desk/65 W AC on the Lenovo Forums about this. If Lenovo cannot come up with a solution for this even under Windows, I may return the laptop asking for a refund, since it's still under warranty.

Would of course very much like to get it to work fully under Linux, since that's what I use. But I'm starting to suspect that the laptop simply cannot tell if it's on desk/lap, and perhaps that's why Lenovo has chosen not to make a firmware workaround for this model (like they are doing for newer models). Maybe they cannot meet regulatory safety requirements without that ability. All speculation from me of course. Would love a straight answer from Lenovo on why this model did not get the firmware workaround.

estan commented 4 years ago

Yeah, I'd expect lap detection to be exposed via one of the ODVP values. If that's not changing then I think we're doing something wrong.

@mjg59 Somewhat related, by again looking through the long Lenovo forum thread, I found this note from @mrhpearson @ Lenovo:

We are working on getting support into the kernel to make this feature much more usable. The first patch to give a sysfs node for lap/desk mode is under review right now (https://sourceforge.net/p/ibm-acpi/mailman/message/37028010/)

So they are adding a way to query lap/desk state via sysfs. But I guess that's not of much use to thermald and the question still remains why none of the ODVP values change when switching between desk and lap.

estan commented 4 years ago

So they are adding a way to query lap/desk state via sysfs. But I guess that's not of much use to thermald and the question still remains why none of the ODVP values change when switching between desk and lap.

...and also, I realize now that this sysfs group will probably not be visible unless you have the updated firmware with thermal management from Lenovo, which was not released for my model.

estan commented 4 years ago

I guess I should try out v5 of @mrhpearson's lapmode kernel patch (https://sourceforge.net/p/ibm-acpi/mailman/message/37052077/) and see if the sysfs node shows up, and if it reports lapmode when I switch between desk and lap. If it does, then at least I'd know that lapmode detection is supported by the X1C6.

spandruvada commented 4 years ago

When we sysfs path for lap/desk mode, thermald will exit. That is what requirement from Lenovo as firmware will take care.

On Sun, 2020-08-02 at 04:15 -0700, Elvis Stansvik wrote:

Yeah, I'd expect lap detection to be exposed via one of the ODVP values. If that's not changing then I think we're doing something wrong.

@mjg59https://github.com/mjg59 Somewhat related, by again looking through the long Lenovo forum thread, I found this notehttps://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/m-p/4028489?page=40#5069974 from @mrhpearsonhttps://github.com/mrhpearson @ Lenovo:

We are working on getting support into the kernel to make this feature much more usable. The first patch to give a sysfs node for lap/desk mode is under review right now (https://sourceforge.net/p/ibm-acpi/mailman/message/37028010/)

So they are adding a way to query lap/desk state via sysfs. But I guess that's not of much use to thermald and the question still remains why none of the ODVP values change when switching between desk and lap.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667660717, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNUBCGPBL7PCJJ5DI53R6VDFHANCNFSM4OKE3TWQ.

estan commented 4 years ago

When we sysfs path for lap/desk mode, thermald will exit. That is what requirement from Lenovo as firmware will take care.

@spandruvada Hm yes, I saw now that's how it'll work (https://github.com/intel/thermal_daemon/commit/4b7c0f20fcb3813f03ec6179075d8799151f3e65). I'm not sure it was a good idea to make it assume "has lapmode sysfs path" => "thermals handled by firmware". It would perhaps have been better if Lenovo had exposed an explicit sysfs path for "handles thermal management", because the lapmode sysfs path could be useful, even if one didn't get the firmware upgrade from Lenovo (like my X1C6). For example, I want to know now if my X1C6 supports detecting lap mode (I know that the firmware does not do thermal management).

I'm assuming here that there may exist such laptops, which a) has support for lap detection, b) did not get the thermal-managing firmware update from Lenovo, and that my X1C6 might be one of them. I don't know if that's correct though. I think @mrhpearson will know.

spandruvada commented 4 years ago

Agree. The name of sysfs node here is not correct.

On Sun, 2020-08-02 at 12:00 -0700, Elvis Stansvik wrote:

When we sysfs path for lap/desk mode, thermald will exit. That is what requirement from Lenovo as firmware will take care.

@spandruvadahttps://github.com/spandruvada Hm yes, I saw now that's how it'll work (intel@4b7c0f2https://github.com/intel/thermal_daemon/commit/4b7c0f20fcb3813f03ec6179075d8799151f3e65). I'm not sure it was a good idea to make it assume "has lapmode sysfs path" => "thermals handled by firmware". It would perhaps have been better if Lenovo had exposed an explicit sysfs path for "handles thermal management", because the lapmode sysfs path could be useful, even if one didn't get the firmware upgrade from Lenovo (like my X1C6). For example, I want to know now if my X1C6 supports it.

I'm assuming here that there may exist such laptops, which a) has support for lap detection, b) did not get the thermal-managing firmware update from Lenovo, and that my X1C6 might be one of them. I don't know if that's correct though. I think @mrhpearsonhttps://github.com/mrhpearson will know.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mjg59/thermal_daemon/issues/7#issuecomment-667711690, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5ALNTIRI6A6BWGLPEONC3R6WZTNANCNFSM4OKE3TWQ.

estan commented 4 years ago

Agree. The name of sysfs node here is not correct.

I built my own kernel with @mrhpearson's patch applied (https://patchwork.kernel.org/patch/11640537/), and the /sys/devices/platform/thinkpad_acpi/dytc_lapmode path does appear, but it's reporting 0 all the time (with laptop on lap and on desk).

So it's not just the name that is incorrect, it's the semantics. I think Lenovo will have to create another sysfs path which indicates exactly if thermal management is done by the firmware or not (it is not on the X1C6).

In this case, I was just interested to see if lap mode was detected correctly, but since it's reporting 0 all the time, something seems wrong. I will post a reply to the patch submission on the ibm-acpi-devel mailing list about this. EDIT: My reply is now here: https://sourceforge.net/p/ibm-acpi/mailman/message/37076074/

mrhpearson commented 4 years ago

Apologies for the slow reply- I was on PTO with no internet access (best way to enjoy PTO ;)). Wanted to ack that I have read this thread. I don't have an X1C6 myself but I'll follow up with the firmware team and see if we're able to get some answers to some of the questions above. As a minor aside I'm also looking into why the dytc_lapmode shows up on the X1C6 if it's not working or supported. That's not what is supposed to happen....

estan commented 4 years ago

Apologies for the slow reply- I was on PTO with no internet access (best way to enjoy PTO ;)). Wanted to ack that I have read this thread.

No worries at all, I'm on PTO myself at the moment :)

I don't have an X1C6 myself but I'll follow up with the firmware team and see if we're able to get some answers to some of the questions above.

Thanks a lot. And sorry for coming at you from different angles here (kernel MLs, GitHub, Lenovo forum).

As a minor aside I'm also looking into why the dytc_lapmode shows up on the X1C6 if it's not working or supported. That's not what is supposed to happen....

Yep. Additionally, it would be good with a dedicated sysfs path that explictly says "this firmware does thermal management", so that thermald can use that one to know when to stay off, instead of piggybacking the dytc_lapmode.

My dream scenario would of course be that

  1. The X1C6 can actually detect lap mode, there's just some glitch right now that can be fixed, and
  2. The firmware actually reacts to lap mode by setting some condition in the APCT table, which thermald can then evaluate.

Otherwise there's no hope of having thermald --adaptive working for the X1C6 I think. Or at least, it won't be adapting to lap mode, which I think is what is wanted to get the best behavior, in lieu of a thermal managing firmware update from Lenovo.

mrhpearson commented 4 years ago

Minor update - I confirmed X1C6 doesn't have the same laptop support as is used for the X1C7/8 does using DYTC. I need to submit a fix to the patch so the dytc_lapmode sysfs entry doesn't show up there.

Still trying to get details on how/if X1C6 reacts to lap mode - I'll update when I find out more.

Yep. Additionally, it would be good with a dedicated sysfs path that explictly says "this firmware does thermal management", so that thermald can use that one to know when to stay off, instead of piggybacking the dytc_lapmode.

I do have a patch for the thermal/performance management that I'm working on. It was posted to the mailing lists for initial review and has some feedback I need to address. Once that is available it should help with knowing if firmware thermal management is available or not - better than dytc_lapmode anyway (and hopefully provide better controls to Linux users to determine the current status and set different performance modes).

estan commented 4 years ago

Thanks for the update @mrhpearson

benzea commented 4 years ago

Yes, using dytc_lapmode for the kill-switch is not optimal. But it is the best we can do for now. thinkpad_acpi is getting a dytc_perfmode soon, and once that is widely available we should be switching the detection code.

Also, I am pretty sure the OEM conditions are never updated on Linux on these machines (i.e. the odvp* sysfs values). I guess that might either mean that we are doing something wrong or that the firmware detects DPTF not being used and simply never updates the values.

spandruvada commented 4 years ago

It is possible that OEM variables are changing but no ACPI notification for change. Current driver assumes that notification will be sent. To test, try rmmod int3400_thermal and modprobe again to force the read again.

benzea commented 4 years ago

Tried this on a X1C7 right now, there all values are stuck at 0. I tried (reloading the module each time):

Always reloading the module, and none of these triggered a change.

Looking at the ACPI code, on this machine (X1C7), the ODVX values are read from another OMVE array. This array in turn is updated by the DYTC method, which checks DPTE (DPTF Enable?). DPTE can be set via _OSC on the INT3400 object (\_SB.IETM._OSC):

            Method (_OSC, 4, Serialized)  // _OSC: Operating System Capabilities
            {
                [SNIP]
                CreateDWordField (Arg3, 0x04, CAP1)
                If ((CAP1 & One))
                {
                   // Set DPTE to 1 ...

So, looks like we need to tell the firmware that we support DPTF. I guess we need to expose this (and possibly other) capabilities as a userspace modifiable value.

EDIT: I half suspect that getting this right will also disable the in-firmware thermal management on these machines.

benzea commented 4 years ago

Ugh, duh. If I set a UUID in /sys/bus/platform/devices/INT3400:00/uuids/current_uuid and then also enable the thermal device by setting /sys/class/thermal/thermal_zone1/mode then it starts working. And I can confirm for example that odvp1 is the dytc_lapmode.

I guess not actually running thermald made me run into that issue :-/

spandruvada commented 4 years ago

Thermald with adaptive will do these steps. Add some prints to takeover_thermal_control().

benzea commented 4 years ago

Yep, I realise that now :)

But, most of these OEM conditions seems to be indirectly set by the OS (through the DYTC method that the thinkpad_acpi driver users). So looking at the various DPTF conditions, I suspect that none except for the last one can ever be matched.

The good news is that thermald and FW don't seem to actually fight over thermal management. I think the firmware code does disable itself (but the FW code works a lot better …)

estan commented 4 years ago

It sounds like you're on to something @benzea. Let me know if there's something you want me to try on my X1C6. Willing to run latest mainline kernel build, build thermald with debug prints, et.c. Whatever it takes to know if thermald --adaptive can be made to work in cooperation with firmware-reported lap mode on this machine! :)

Would be so sweet to be able to max it out at ~97 degrees C while on the desk at work, and have it chill out a little automatically while on my lap.

benzea commented 4 years ago

Don't hold your breath, the amount of required work in various places until that can begin to work is ugly.

EDIT: It isn't really that hard, but it requires adding sysfs attributes to set the ACPI variables that feed the OEM flags (pretty sure we got stuff like "docked" and performance slider position there). These are thinkpad_acpi patches but reverse engineering may be required (unless Lenovo provides more specification about the DYTC method).

Once we have those, we can write thinkpad specific userspace code to correctly set them. You know, detect whether a TB/USB dock is attached and then set the appropriate flags and such.

And, once we do that, we probably start getting the right profiles selected by the adaptive code.

But for now … it is all stuck at the last profile because the default values for some of the OEM variables (i.e. the odvp stuff) are outside of the valid range. Just having good defaults in the ACPI could help a lot …

estan commented 4 years ago

Alright @benzea, I understand. Just let me know if there's anything I can do to help.

Regarding

EDIT: It isn't really that hard, but it requires adding sysfs attributes to set the ACPI variables that feed the OEM flags (pretty sure we got stuff like "docked" and performance slider position there). These are thinkpad_acpi patches but reverse engineering may be required (unless Lenovo provides more specification about the DYTC method).

perhaps @mrhpearson from Lenovo can help out? In the big thread on the Lenovo forums, when it was finally announced that only some models will get the thermal-managing firmware upgrade, users of slightly older models (like my X1C6) were told that Lenovo was looking on with interest at the work done by @mjg59 to get adaptive mode DPTF support into thermald. Maybe Lenovo could provide some info to reduce the amount of reverse engineering required to figure out how things should be set up.

mrhpearson commented 4 years ago

perhaps @mrhpearson from Lenovo can help out? In the big thread on the Lenovo forums, when it was finally announced that only some models will get the thermal-managing firmware upgrade, users of slightly older models (like my X1C6) were told that Lenovo was looking on with interest at the work done by @mjg59 to get adaptive mode DPTF support into thermald. Maybe Lenovo could provide some info to reduce the amount of reverse engineering required to figure out how things should be set up.

@estan - I'm trying but it's slow going. The firmware team are very hesitant to provide any information that might also be related to or involved with pieces that are under NDA with Intel. I realise that's a crappy answer :( I can definitely help with any testing for the platforms we have in our team - and if I can figure out pieces directly myself I will but so far I'm really limited in what information I have myself.

estan commented 4 years ago

@mrhpearson No worries, we've all suffered from bureaucracy. You're a champion for Linux on Thinkpads doing a great job with what you've got. And patiently so, judging by the recent activity on the platform-driver-x86 list :)

estan commented 2 years ago

EDIT: It isn't really that hard, but it requires adding sysfs attributes to set the ACPI variables that feed the OEM flags (pretty sure we got stuff like "docked" and performance slider position there). These are thinkpad_acpi patches but reverse engineering may be required (unless Lenovo provides more specification about the DYTC method).

perhaps @mrhpearson from Lenovo can help out? In the big thread on the Lenovo forums, when it was finally announced that only some models will get the thermal-managing firmware upgrade, users of slightly older models (like my X1C6) were told that Lenovo was looking on with interest at the work done by @mjg59 to get adaptive mode DPTF support into thermald. Maybe Lenovo could provide some info to reduce the amount of reverse engineering required to figure out how things should be set up.

@estan - I'm trying but it's slow going. The firmware team are very hesitant to provide any information that might also be related to or involved with pieces that are under NDA with Intel. I realise that's a crappy answer :( I can definitely help with any testing for the platforms we have in our team - and if I can figure out pieces directly myself I will but so far I'm really limited in what information I have myself.

@mrhpearson I'm guessing it wasn't possible to pull any further info on this out of the firmware team (?)

I'm fine with closing this issue, and besides it's a little misplaced here on @mjg59's fork now that thermald adaptive mode is long since upstreamed. When it comes to the throttling issue, it pretty much resolves itself if I run a current thermald. The laptop when stressed can then run CPUs at approx 90°C @ 20 W package power draw, which is significantly better than what it used to be when I initially reported this. Regarding the adaptive part, I think I just have to accept that this laptop either doesn't have lap sensing capabilities, or they are not working correctly. This agrees with what the current kernel does (does not register lap_mode sysfs if DYTC version is < 5).

I'm still using this X1C6 as my main work laptop, and I guess I could hack something together so that when I dock at work, it starts running thermald, and stops when I undock. Since at work I'm 90% of the time docked at my desk, and at home 90% on the couch, it will be a fairly good approximation of lap sensing :)

@mrhpearson BTW great talk at DebConf 22 :+1: (Lenovo: Give this man a raise!).