katakombi / LinuxMint-t490s

Linux Mint 19 on Lenovo Thinkpad t490s
Apache License 2.0
16 stars 0 forks source link

Fix for throttling #2

Open bob1de opened 5 years ago

bob1de commented 5 years ago

Hello,

Thanks for sharing your experience with T490s! I've got the i7-16G German campus model and really dislike the keyboards build quality. The Alt key was cracking when pressed from the beginning and dust gets easily under the keys. I had a HP EliteBook 840 G1 before, which was a lot better in this regard. Also disappointing that you can't replace the keyboard without replacing the whole palm rest and sides as well. Hopefully they'll do it under on-site warranty when it should become necessary some day, but that can get quite expensive I fear since you need to disassemble the whole mainboard. Crazy ones who designed this layout.

But anyway, it's well possible to disable the software-caused throttling on this model as well. CPU power usage then stays at 25W until thermal throttling kicks in at 95°C, which happens quite quickly because of the terrible, single heatpipe. It stabilizes at 17-18W after some minutes, as that's what the cooling system can dissipate.

You have to set the correct UUID for intels thermal platform: https://github.com/erpalma/throttled/issues/118

Just run this immediately after boot, e.g. via systemd unit or /etc/rc.local:

# echo 63BE270F-1C11-48FD-A6F7-3AF253FF3E2D | tee /sys/devices/platform/INT3400:00/uuids/current_uuid

Best regards Robert

katakombi commented 5 years ago

Thanks Robert,

katakombi commented 5 years ago

Hi Robert,

I can't apply your fix. It says:. bash: echo: write error: Invalid argument whenever I write to /sys/devices/platform/INT3400:00/uuids/current_uuid. My cat /sys/devices/platform/INT3400:00/uuis/available_uuids is empty.

I am running kernel 5.0.0.

NB Ive briefly checked the issue youve mentioned in your post but the info there doesnt seem to apply on my system. Looks like a wrong/missing kernel module, doesnt it?

bob1de commented 5 years ago

Hi,

Yes, probably some module is missing. I've got this:

# inxi -F
System:    Host: schnitzel Kernel: 4.19.0-5-amd64 x86_64 bits: 64 Console: tty 8 Distro: Debian GNU/Linux 10 (buster) 
Machine:   Type: Laptop System: LENOVO product: 20NYS02B00 v: ThinkPad T490s serial: xxx 
           Mobo: LENOVO model: 20NYS02B00 v: SDK0J40697 WIN serial: xxx UEFI: LENOVO v: N2JET31W (1.09 ) 
           date: 03/15/2019 
Battery:   ID-1: BAT0 charge: 57.1 Wh condition: 57.5/57.0 Wh (101%) 
CPU:       Topology: Quad Core model: Intel Core i7-8565U bits: 64 type: MT MCP L2 cache: 8192 KiB 
           Speed: 500 MHz min/max: 400/4600 MHz Core speeds (MHz): 1: 500 2: 500 3: 500 4: 500 5: 500 6: 500 7: 500 8: 500 
Graphics:  Device-1: Intel UHD Graphics 620 driver: i915 v: kernel 
           Display: server: X.org 1.20.4 driver: i915 tty: 159x47 
           Message: Unable to show advanced data. Required tool glxinfo missing. 
Audio:     Device-1: Intel Cannon Point-LP High Definition Audio driver: snd_hda_intel 
           Device-2: Lenovo type: USB driver: hid-generic,snd-usb-audio,usbhid 
           Sound Server: ALSA v: k4.19.0-5-amd64 
Network:   Device-1: Intel Cannon Point-LP CNVi [Wireless-AC] driver: iwlwifi 
           IF: wlan0 state: down mac: xx:xx:xx:xx:xx:xx 
           Device-2: Intel Ethernet I219-V driver: e1000e 
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: xx:xx:xx:xx:xx:xx 
           IF-ID-1: docker0 state: down mac: 02:42:9f:9d:e4:1f 
           IF-ID-2: vpnsvc state: unknown speed: N/A duplex: N/A mac: N/A 
Drives:    Local Storage: total: 476.94 GiB used: 52.82 GiB (11.1%) 
           ID-1: /dev/nvme0n1 vendor: Samsung model: MZVLB512HAJQ-000H1 size: 476.94 GiB 
Partition: ID-1: / size: 372.51 GiB used: 52.62 GiB (14.1%) fs: btrfs dev: /dev/dm-0 
           ID-2: /boot size: 488.0 MiB used: 206.1 MiB (42.2%) fs: btrfs dev: /dev/nvme0n1p2 
           ID-3: /home size: 372.51 GiB used: 52.62 GiB (14.1%) fs: btrfs dev: /dev/dm-0 
Sensors:   System Temperatures: cpu: 39.0 C mobo: N/A 
           Fan Speeds (RPM): cpu: 0 
Info:      Processes: 340 Uptime: 11d 22h 05m Memory: 15.36 GiB used: 6.89 GiB (44.9%) Init: systemd runlevel: 5 Shell: zsh 
           inxi: 3.0.32 

# cat /sys/devices/platform/INT3400:00/uuids/available_uuids 
63BE270F-1C11-48FD-A6F7-3AF253FF3E2D
9E04115A-AE87-4D1C-9500-0F3E340BFE75

# cat /sys/devices/platform/INT3400:00/uuids/current_uuid 
63BE270F-1C11-48FD-A6F7-3AF253FF3E2D

# lsmod|grep int34
int3403_thermal        16384  0
int340x_thermal_zone    16384  2 int3403_thermal,processor_thermal_device
int3400_thermal        16384  0
acpi_thermal_rel       16384  1 int3400_thermal

Have you loaded these modules?

BTW, I disabled Secure Boot because of undervolting, not sure it makes a difference for the UUIDs.

bob1de commented 5 years ago

And, how far did you get with undervolting? Is it really these -175mV? When I go beyond -100mV for core and cache, I get sporadic machine check exceptions under moderate load.

katakombi commented 5 years ago
katakombi commented 5 years ago

not sure but this might point at a failure at boot up:

dmesg |grep -i thermal
[    0.266249] mce: CPU0: Thermal monitoring enabled (TM1)
[    0.412584] ACPI: \_SB_.PR00: _OSC native thermal LVT Acked
[    2.011720] thermal LNXTHERM:00: registered as thermal_zone0
[    2.011721] ACPI: Thermal Zone [THM0] (43 C)
[   13.067452] proc_thermal 0000:00:04.0: enabling device (0000 -> 0002)
[   13.160703] proc_thermal 0000:00:04.0: Creating sysfs group for PROC_THERMAL_PCI
[   13.291896] thermal thermal_zone6: failed to read out thermal zone (-61)

do you see a similar message?

bob1de commented 5 years ago

Hm, I'll try undervolting only the core to -140 mV tonight and report back. GPU performance doesn't matter to me, so I keep that at normal voltage.

Your dmesg output looks exactly like mine. Here's a list of loaded modules:

Module                  Size  Used by
ctr                    16384  0
ccm                    20480  0
ufs                    86016  0
qnx4                   16384  0
hfsplus               114688  0
hfs                    69632  0
minix                  40960  0
msdos                  20480  0
jfs                   208896  0
xfs                  1458176  0
ext4                  733184  0
mbcache                16384  1 ext4
jbd2                  122880  1 ext4
fscrypto               32768  1 ext4
ecb                    16384  0
cpuid                  16384  0
fuse                  122880  3
rfcomm                 86016  4
ipt_MASQUERADE         16384  1
nf_conntrack_netlink    49152  0
xfrm_user              40960  1
xfrm_algo              16384  1 xfrm_user
nft_counter            16384  15
nft_chain_nat_ipv4     16384  4
nf_nat_ipv4            16384  2 ipt_MASQUERADE,nft_chain_nat_ipv4
xt_addrtype            16384  1
nft_compat             20480  4
nf_tables             143360  45 nft_compat,nft_chain_nat_ipv4,nft_counter
nfnetlink              16384  4 nft_compat,nf_conntrack_netlink,nf_tables
xt_conntrack           16384  1
nf_nat                 36864  1 nf_nat_ipv4
nf_conntrack          163840  5 xt_conntrack,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink
nf_defrag_ipv6         20480  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
br_netfilter           24576  0
bridge                188416  1 br_netfilter
stp                    16384  1 bridge
llc                    16384  2 bridge,stp
overlay               126976  0
wireguard             225280  0
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             16384  1 wireguard
cmac                   16384  1
msr                    16384  0
bnep                   24576  2
uinput                 20480  1
snd_hda_codec_hdmi     57344  1
arc4                   16384  2
snd_soc_skl           114688  0
snd_soc_skl_ipc        73728  1 snd_soc_skl
snd_soc_sst_ipc        16384  1 snd_soc_skl_ipc
btusb                  53248  0
nls_ascii              16384  1
snd_soc_sst_dsp        36864  1 snd_soc_skl_ipc
snd_hda_codec_realtek   122880  1
nls_cp437              20480  1
snd_hda_ext_core       28672  1 snd_soc_skl
btrtl                  16384  1 btusb
btbcm                  16384  1 btusb
snd_soc_acpi_intel_match    24576  1 snd_soc_skl
snd_hda_codec_generic    86016  1 snd_hda_codec_realtek
btintel                24576  1 btusb
vfat                   20480  1
snd_soc_acpi           16384  2 snd_soc_acpi_intel_match,snd_soc_skl
fat                    86016  2 msdos,vfat
snd_soc_core          253952  1 snd_soc_skl
bluetooth             643072  31 btrtl,btintel,btbcm,bnep,btusb,rfcomm
intel_rapl             24576  0
snd_compress           24576  1 snd_soc_core
uvcvideo              118784  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
snd_usb_audio         253952  2
videobuf2_vmalloc      16384  1 uvcvideo
snd_hda_intel          45056  9
coretemp               16384  0
videobuf2_memops       16384  1 videobuf2_vmalloc
videobuf2_v4l2         28672  1 uvcvideo
snd_hda_codec         151552  4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
videobuf2_common       53248  2 videobuf2_v4l2,uvcvideo
snd_usbmidi_lib        36864  1 snd_usb_audio
efi_pstore             16384  0
drbg                   28672  1
iwlmvm                299008  0
kvm_intel             245760  0
snd_rawmidi            40960  1 snd_usbmidi_lib
ansi_cprng             16384  0
videodev              212992  3 videobuf2_v4l2,uvcvideo,videobuf2_common
mac80211              815104  1 iwlmvm
kvm                   724992  1 kvm_intel
mei_me                 45056  0
snd_hda_core           94208  7 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_ext_core,snd_hda_codec,snd_hda_codec_realtek,snd_soc_skl
ecdh_generic           24576  2 bluetooth
snd_hwdep              16384  2 snd_usb_audio,snd_hda_codec
elan_i2c               45056  0
efivars                20480  1 efi_pstore
mei                   118784  1 mei_me
media                  45056  2 videodev,uvcvideo
irqbypass              16384  1 kvm
idma64                 20480  0
crc16                  16384  2 bluetooth,ext4
snd_seq_device         16384  1 snd_rawmidi
joydev                 24576  0
snd_pcm               114688  9 snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_ext_core,snd_hda_codec,snd_soc_core,snd_soc_skl,snd_hda_core
intel_cstate           16384  0
tpm_crb                16384  0
iwlwifi               241664  1 iwlmvm
intel_uncore          135168  0
intel_rapl_perf        16384  0
pcspkr                 16384  0
serio_raw              16384  0
wmi_bmof               16384  0
tpm_tis                16384  0
snd_timer              36864  1 snd_pcm
iTCO_wdt               16384  0
cfg80211              761856  3 iwlmvm,iwlwifi,mac80211
iTCO_vendor_support    16384  1 iTCO_wdt
thinkpad_acpi         106496  1
processor_thermal_device    16384  0
tpm_tis_core           20480  1 tpm_tis
ucsi_acpi              16384  0
intel_pch_thermal      16384  0
typec_ucsi             36864  1 ucsi_acpi
intel_soc_dts_iosf     16384  1 processor_thermal_device
tpm                    65536  3 tpm_tis,tpm_crb,tpm_tis_core
typec                  45056  1 typec_ucsi
nvram                  16384  1 thinkpad_acpi
rng_core               16384  1 tpm
snd                    94208  36 snd_hda_codec_generic,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_compress,thinkpad_acpi,snd_soc_core,snd_pcm,snd_rawmidi
soundcore              16384  1 snd
rfkill                 28672  9 bluetooth,thinkpad_acpi,cfg80211
int3403_thermal        16384  0
ac                     16384  0
int340x_thermal_zone    16384  2 int3403_thermal,processor_thermal_device
battery                20480  1 thinkpad_acpi
pcc_cpufreq            16384  0
evdev                  28672  31
int3400_thermal        16384  0
acpi_pad               24576  0
acpi_thermal_rel       16384  1 int3400_thermal
parport_pc             32768  0
ppdev                  20480  0
lp                     20480  0
parport                57344  3 parport_pc,lp,ppdev
efivarfs               16384  1
ip_tables              28672  0
x_tables               45056  5 xt_conntrack,nft_compat,ipt_MASQUERADE,xt_addrtype,ip_tables
autofs4                49152  2
btrfs                1384448  2
xor                    24576  1 btrfs
zstd_decompress        81920  1 btrfs
zstd_compress         172032  1 btrfs
xxhash                 16384  2 zstd_compress,zstd_decompress
raid6_pq              122880  1 btrfs
libcrc32c              16384  4 nf_conntrack,nf_nat,btrfs,xfs
crc32c_generic         16384  0
algif_skcipher         16384  0
af_alg                 28672  1 algif_skcipher
dm_crypt               40960  1
dm_mod                155648  3 dm_crypt
hid_generic            16384  0
usbhid                 57344  0
hid                   135168  2 usbhid,hid_generic
i915                 1728512  13
crct10dif_pclmul       16384  0
crc32_pclmul           16384  0
crc32c_intel           24576  1
ghash_clmulni_intel    16384  0
pcbc                   16384  0
i2c_algo_bit           16384  1 i915
drm_kms_helper        200704  1 i915
xhci_pci               16384  0
xhci_hcd              266240  1 xhci_pci
drm                   483328  6 drm_kms_helper,i915
aesni_intel           200704  4
i2c_i801               28672  0
e1000e                282624  0
usbcore               290816  7 xhci_hcd,snd_usb_audio,usbhid,snd_usbmidi_lib,uvcvideo,btusb,xhci_pci
psmouse               172032  0
aes_x86_64             20480  1 aesni_intel
crypto_simd            16384  1 aesni_intel
cryptd                 28672  4 crypto_simd,ghash_clmulni_intel,aesni_intel
glue_helper            16384  1 aesni_intel
nvme                   36864  3
nvme_core              81920  5 nvme
intel_lpss_pci         20480  0
intel_lpss             16384  1 intel_lpss_pci
usb_common             16384  1 usbcore
thermal                20480  0
wmi                    28672  1 wmi_bmof
video                  45056  2 thinkpad_acpi,i915
button                 16384  0
bob1de commented 5 years ago

Just found this: https://lore.kernel.org/lkml/20181010083007.239938-1-matthewgarrett@google.com/

The patchset is queued for 5.1, but that doesn't explain why it works with my 4.19 out of the box.

katakombi commented 5 years ago

Just found this: https://lore.kernel.org/lkml/20181010083007.239938-1-matthewgarrett@google.com/

The patchset is queued for 5.1, but that doesn't explain why it works with my 4.19 out of the box.

thanks for your support so far!

bob1de commented 5 years ago

I think the patchset will at least allow to enable more UUIDs, but as it's just a kernel patch, there is no userland tool to select them. There is the Intel DPTF implementation for Linux, but I wouldn't like to use it because it's proprietary.

Maybe your best bet is to wait for 5.1 and see if that helps. I'll try to get a 5.x kernel as well later today for testing.

And yes, my dmesg | grep -i thermal looks identical, including the error with zone 6.

bob1de commented 5 years ago

Just frying the machine with a Linux 5.1 compilation using 8 threads :)

Went to 95°C immediately and stabilized at 2.9GHz with Core and Cache at -100mV. I'll report back when the new kernel is ready and booted up.

katakombi commented 5 years ago

What values let it freeze? CPU -140mV?

I find it a bit odd that I seemingly can CPU-undervolt so much more than you. But I haven't yet stressed the machine very much (just synthetically using stress-ng, geekbench and unigine) Is it possibly that i5 cores can be undervolted better in general?

I will try to compile a kernel now, too, and check if that works well...

bob1de commented 5 years ago

Basically, the i5 and i7 are the same chip with different firmware. Production quality varies a lot for silicon, and they just label the better ones i7. Since the i5's are slowed down by their firmware, there are more spare resources available, which you can benefit from when undervolting.

I actually wanted to get the i5 model as well, but decided for the i7 because I need a high single core performance, even if that's just 15%. Under full load on all cores, they don't seem to differ a lot.

BTW, compilation with the debian kernel config took 25 minutes using 8 threads (make -j8).

katakombi commented 5 years ago

My temp when compiling kernel 5.1 stays below 70 degrees and CPU speed is throttled to 1900Mhz. There should be room for improvement. powertop reports a discharge rate of 20W total. fan is hardly audible, no idea if it can run higher speeds. I compile using make-kpkg but that should make no big difference in compile time.

Update: Now I connected to AC, and the CPU temps went to 95-97, with 2400Mhz CPU speed. I forgot about tlp throttling down things for my system...

bob1de commented 5 years ago

Yes, I had the same behaviour before applying the UUID fix, temps under load no more than 70°C. make kpkg should be fine, as long as it was make -j8 kpkg, because otherwise it runs single-core.

bob1de commented 5 years ago

I'm now up and running with 5.1 and notice no difference to 4.19. Same UUIDs available.

bob1de commented 5 years ago

Update: Now I connected to AC, and the CPU temps went to 95-97, with 2400Mhz CPU speed. I forgot about tlp throttling down things for my system...

Ah, that's interesting. I don't run tlp because I find it too much magic :)

Do these temps and clock frequencies stay permanently, or is it just for some seconds?

bob1de commented 5 years ago

Wow, I played a little with the voltages and it seems that the analogio and system agent voltages do indeed matter. I've now got this and it seems stable at 3.0-3.2GHz on all cores, the whole system consuming 41W:

# sudo ./intel-undervolt/intel-undervolt read   
CPU (0): -139.65 mV
GPU (1): -80.08 mV
CPU Cache (2): -80.08 mV
System Agent (3): -80.08 mV
Analog I/O (4): -80.08 mV

Short term package power: 51 W, 0.109 s, enabled
Long term package power: 51 W, 28.000 s, enabled

Critical offset: -5°C

With -100mV instead of -80 the system froze while compiling.

katakombi commented 5 years ago
bob1de commented 5 years ago

I quite like this tool for monitoring what's going on, it auto-refreshes and shows power consumption, temps and clock frequencies all on one screen:

git clone https://github.com/kitsunyan/intel-undervolt
cd intel-undervolt
make 
./intel-undervolt measure

Just in case you are interested. It can even apply the undervolting without having to write hex values yourself.

bob1de commented 5 years ago

Thats great to hear! I already expected CPU and GPU as most important values. No need to tune the other values down until you find the ideal setting for these two I guess.

Undervolting analogio and system agent gave me another 200-300MHz per core, which is quite good I think.

katakombi commented 5 years ago

Thanks, Robert. Has been fun playing around and having this track down with you. Going to check out intel-undervolt and maybe report back next week. Enjoy the weekend!

bob1de commented 5 years ago

Thanks, same to you!