linuxmint / cinnamon

A Linux desktop featuring a traditional layout, built from modern technology and introducing brand new innovative features.
GNU General Public License v2.0
4.44k stars 734 forks source link

cinnamon + kernels >5.2 somehow cause extreme CPU temperatures #9085

Closed calestyo closed 4 years ago

calestyo commented 4 years ago
 * Cinnamon 4.2.4 (not running in software rendering mode)
 * Distribution - Debian sid
 * Intel(R) HD Graphics 620 (Kaby Lake GT2) / Kernel driver in use: i915 / Xorg uses modesetting
 * 64 bit

The system has a Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz ... and a HiDPI display.

I've always had the issue that when playing videos (despite hardware acceleration being used), CPU utilisation got quite high, especially when playing fullscreen. In such situation, the cinnamon process went up to 30 or 40%

But when I've recently upgraded from linux 5.2 to 5.3 (and it still persists with 5.4) the situation got just extreme.

When cinnamon runs, even when it's idle and no video whatsoever is played, the CPU temperature increased at average between 10-15 °C. When playing videos or quickly moving windows, 90-100°C are reached quite quickly, and even if the "responsible" action is stopped, it take 10 mins or more, until the temperature goes back to "normal" values (which are still 10-15 °C higher than with kernel 5.2). Similarly, when scrolling up/down quickly in e.g. the email list of Evolution or Thunderbird (which is just a list of subject/from/date lines... temperatures reach 75°C or more.

Interestingly, on 5.4, the cinnamon CPU utilisation as shown by top, isn't much different from what I see on 5.2 ... it's just the temperature (and CPU fans) which go crazy.

Originally I've assumed it was only a kernel problem, so I've reported the issue here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945055 https://lore.kernel.org/lkml/d05aba2742ae42783788c954e2a380e7fcb10830.camel@scientia.net/ https://lore.kernel.org/lkml/43b7eef5560ede5ad2973964f68d9e6beba63a91.camel@scientia.net/

There I describe my tests in more details, so it's worth a look.

Since I've always had the problem that the cinnmon produced quite some CPU utilisation when playing videos (which didn't bother me that much) I've re-installed gnome and tried its gnome-classic session. Oh and this still uses Xorg.

With gnome-classic, all problems (well apart from having gnome running ;-) ) go away, also with 5.3 and 5.4. Even when I play videos "CPU intensive" like fullHD H.264 at full screen (using hardware acceleration via vaapi) CPU temperature barely go over 60-65°C, while with cinnamon this was even problematic with 5.2 kernels (reaching >90°C).

I have no idea on how to move on debugging, so any help would be highly appreciated..

There must be something cinnamon does in terms of graphics which causes already much higher load than e.g. gnome-classic, when playing videos... and there must be something it does (which gnome-classic doesn't) that makes it much much much worse when running kernels >5.2.

Thanks, Chris.

leigh123linux commented 4 years ago

File it at the debian bug tracker instead as it's a kernel issue! Also 4.2.x cinnamon is EOL so bug the debian maintainer instead!

calestyo commented 4 years ago

Well it's clearly not just a kernel issue if it happens only with cinnamon. Also, I doubt anything has changed since 4.2 which would fix the issue.

leigh123linux commented 4 years ago

Either way it's down to Debian to fix it.

leigh123linux commented 4 years ago

I can't reproduce on mint or fedora

System:    Host: caroline-Lenovo-YOGA-510-14ISK Kernel: 5.0.0-37-generic x86_64 bits: 64 
           compiler: gcc v: 7.4.0 Desktop: Cinnamon 4.4.6 wm: muffin dm: LightDM 
           Distro: Linux Mint 19.3 Tricia base: Ubuntu 18.04 bionic 
Machine:   Type: Convertible System: LENOVO product: 80S7 v: Lenovo YOGA 510-14ISK 
           serial: <filter> Chassis: type: 31 v: Lenovo YOGA 510-14ISK serial: <filter> 
           Mobo: LENOVO model: LNVNB161216 v: SDK0J40709 WIN serial: <filter> UEFI: LENOVO 
           v: 0VCN31WW(V1.15) date: 06/19/2018 
Battery:   ID-1: BAT1 charge: 45.5 Wh condition: 45.5/52.5 Wh (87%) volts: 12.0/11.2 
           model: SIMPLO PABAS0241231 serial: <filter> status: Discharging 
CPU:       Topology: Dual Core model: Intel Core i7-6500U bits: 64 type: MT MCP arch: Skylake 
           rev: 3 L2 cache: 4096 KiB 
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 20736 
           Speed: 700 MHz min/max: 400/3100 MHz Core speeds (MHz): 1: 600 2: 612 3: 609 4: 654 
Graphics:  Device-1: Intel Skylake GT2 [HD Graphics 520] vendor: Lenovo driver: i915 v: kernel 
           bus ID: 00:02.0 chip ID: 8086:1916 
           Device-2: AMD Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330 / M430 / R7 M520] 
           vendor: Lenovo driver: radeon v: kernel bus ID: 01:00.0 chip ID: 1002:6660 
           Display: x11 server: X.Org 1.19.6 driver: ati,modesetting,radeon unloaded: fbdev,vesa 
           tty: N/A 
           OpenGL: renderer: Mesa DRI Intel HD Graphics 520 (Skylake GT2) v: 4.5 Mesa 19.0.8 
           compat-v: 3.0 direct render: Yes 
Audio:     Device-1: Intel Sunrise Point-LP HD Audio vendor: Lenovo driver: snd_hda_intel 
           v: kernel bus ID: 00:1f.3 chip ID: 8086:9d70 
           Sound Server: ALSA v: k5.0.0-37-generic 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Lenovo 
           driver: r8169 v: kernel port: 4000 bus ID: 02:00.0 chip ID: 10ec:8168 
           IF: enp2s0 state: down mac: <filter> 
           Device-2: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter vendor: Lenovo 
           driver: ath10k_pci v: kernel port: 4000 bus ID: 03:00.0 chip ID: 168c:0042 
           IF: wlp3s0 state: up mac: <filter> 
           Device-3: Atheros type: USB driver: btusb bus ID: 1-7:3 chip ID: 0cf3:e360 
Drives:    Local Storage: total: 238.47 GiB used: 14.68 GiB (6.2%) 
           ID-1: /dev/sda vendor: Samsung model: MZYTY256HDHP-000L2 size: 238.47 GiB 
           speed: 6.0 Gb/s serial: <filter> 
Partition: ID-1: / size: 18.21 GiB used: 12.12 GiB (66.6%) fs: ext4 dev: /dev/sda8 
           ID-2: /home size: 132.27 GiB used: 2.53 GiB (1.9%) fs: ext4 dev: /dev/sda9 
           ID-3: swap-1 size: 7.45 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda10 
Sensors:   System Temperatures: cpu: 38.0 C mobo: N/A gpu: radeon temp: 30 C 
           Fan Speeds (RPM): N/A 
Repos:     No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list 
           1: deb http: //packages.linuxmint.com tricia main upstream import backport #id:linuxmint_main
           2: deb http: //archive.ubuntu.com/ubuntu bionic main restricted universe multiverse
           3: deb http: //archive.ubuntu.com/ubuntu bionic-updates main restricted universe multiverse
           4: deb http: //archive.ubuntu.com/ubuntu bionic-backports main restricted universe multiverse
           5: deb http: //security.ubuntu.com/ubuntu/ bionic-security main restricted universe multiverse
           6: deb http: //archive.canonical.com/ubuntu/ bionic partner
Info:      Processes: 206 Uptime: 1m Memory: 7.64 GiB used: 1.14 GiB (14.9%) Init: systemd v: 237 
           runlevel: 5 Compilers: gcc: 7.4.0 alt: 7 Client: Unknown python3.6 client inxi: 3.0.32
System:    Host: mpd.intel-domain Kernel: 5.4.5-300.fc31.x86_64 x86_64 bits: 64 compiler: gcc 
           v: 9.2.1 Desktop: Cinnamon 4.4.5 wm: muffin dm: LightDM 
           Distro: Fedora release 31 (Thirty One) 
Machine:   Type: Desktop Mobo: GIGABYTE model: MZGLKAP-00 v: 1.x serial: <filter> 
           UEFI: American Megatrends v: F6 date: 03/19/2019 
CPU:       Topology: Quad Core model: Intel Celeron J4105 bits: 64 type: MCP arch: Goldmont Plus 
           rev: 1 L2 cache: 4096 KiB 
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 11980 
           Speed: 861 MHz min/max: 800/2500 MHz Core speeds (MHz): 1: 896 2: 899 3: 899 4: 899 
Graphics:  Device-1: Intel UHD Graphics 605 vendor: Gigabyte driver: i915 v: kernel 
           bus ID: 00:02.0 chip ID: 8086:3185 
           Display: x11 server: Fedora Project X.org 1.20.6 driver: modesetting 
           unloaded: fbdev,vesa resolution: 3840x2160~60Hz 
           OpenGL: renderer: Mesa DRI Intel UHD Graphics 600 (Geminilake 2x6) v: 4.5 Mesa 19.2.8 
           compat-v: 3.0 direct render: Yes 
Audio:     Device-1: Intel vendor: Gigabyte driver: snd_hda_intel v: kernel bus ID: 00:0e.0 
           chip ID: 8086:3198 
           Device-2: Thesycon System & Consulting D10 type: USB driver: snd-usb-audio 
           bus ID: 1-1:2 chip ID: 152a:8750 
           Sound Server: ALSA v: k5.4.5-300.fc31.x86_64 
Network:   Device-1: Intel Dual Band Wireless-AC 3168NGW [Stone Peak] driver: iwlwifi v: kernel 
           port: f040 bus ID: 02:00.0 chip ID: 8086:24fb 
           IF: wlp2s0 state: down mac: <filter> 
           Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Gigabyte 
           driver: r8169 v: kernel port: e000 bus ID: 03:00.0 chip ID: 10ec:8168 
           IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 465.76 GiB used: 265.34 GiB (57.0%) 
           ID-1: /dev/nvme0n1 vendor: Crucial model: CT500P1SSD8 size: 465.76 GiB speed: 31.6 Gb/s 
           lanes: 4 serial: <filter> 
Partition: ID-1: / size: 48.97 GiB used: 30.99 GiB (63.3%) fs: ext4 dev: /dev/dm-0 
           ID-2: /boot size: 975.9 MiB used: 217.7 MiB (22.3%) fs: ext4 dev: /dev/nvme0n1p2 
           ID-3: /home size: 399.45 GiB used: 234.01 GiB (58.6%) fs: ext4 dev: /dev/dm-2 
           ID-4: swap-1 size: 7.73 GiB used: 108.2 MiB (1.4%) fs: swap dev: /dev/dm-1 
Sensors:   System Temperatures: cpu: 49.0 C mobo: N/A 
           Fan Speeds (RPM): N/A 
Repos:     Active yum repos in: /etc/yum.repos.d/_copr:copr.fedorainfracloud.org:leigh123linux:cinnamon_4.4.repo 
           1: copr:copr.fedorainfracloud.org:leigh123linux:cinnamon_4.4 ~ https: //copr-be.cloud.fedoraproject.org/results/leigh123linux/cinnamon_4.4/fedora-$releasever-$basearch/
           No active yum repos in: /etc/yum.repos.d/_copr:copr.fedorainfracloud.org:leigh123linux:libglvnd_next.repo 
           No active yum repos in: /etc/yum.repos.d/_copr_phracek-PyCharm.repo 
           No active yum repos in: /etc/yum.repos.d/fedora-cisco-openh264.repo 
           No active yum repos in: /etc/yum.repos.d/fedora-modular.repo 
           No active yum repos in: /etc/yum.repos.d/fedora-updates-modular.repo 
           No active yum repos in: /etc/yum.repos.d/fedora-updates-testing-modular.repo 
           No active yum repos in: /etc/yum.repos.d/fedora-updates-testing.repo 
           Active yum repos in: /etc/yum.repos.d/fedora-updates.repo 
           1: updates ~ https: //mirrors.fedoraproject.org/metalink?repo=updates-released-f$releasever&arch=$basearch
           Active yum repos in: /etc/yum.repos.d/fedora.repo 
           1: fedora ~ https: //mirrors.fedoraproject.org/metalink?repo=fedora-$releasever&arch=$basearch
           No active yum repos in: /etc/yum.repos.d/google-chrome.repo 
           No active yum repos in: /etc/yum.repos.d/rpmfusion-free-rawhide.repo 
           No active yum repos in: /etc/yum.repos.d/rpmfusion-free-updates-testing.repo 
           Active yum repos in: /etc/yum.repos.d/rpmfusion-free-updates.repo 
           1: rpmfusion-free-updates ~ https: //mirrors.rpmfusion.org/metalink?repo=free-fedora-updates-released-$releasever&arch=$basearch
           Active yum repos in: /etc/yum.repos.d/rpmfusion-free.repo 
           1: rpmfusion-free ~ https: //mirrors.rpmfusion.org/metalink?repo=free-fedora-$releasever&arch=$basearch
           Active yum repos in: /etc/yum.repos.d/rpmfusion-nonfree-nvidia-driver.repo 
           1: rpmfusion-nonfree-nvidia-driver ~ https: //mirrors.rpmfusion.org/metalink?repo=nonfree-fedora-nvidia-driver-$releasever&arch=$basearch
           No active yum repos in: /etc/yum.repos.d/rpmfusion-nonfree-rawhide.repo 
           Active yum repos in: /etc/yum.repos.d/rpmfusion-nonfree-steam.repo 
           1: rpmfusion-nonfree-steam ~ https: //mirrors.rpmfusion.org/metalink?repo=nonfree-fedora-steam-$releasever&arch=$basearch
           No active yum repos in: /etc/yum.repos.d/rpmfusion-nonfree-updates-testing.repo 
           Active yum repos in: /etc/yum.repos.d/rpmfusion-nonfree-updates.repo 
           1: rpmfusion-nonfree-updates ~ https: //mirrors.rpmfusion.org/metalink?repo=nonfree-fedora-updates-released-$releasever&arch=$basearch
           Active yum repos in: /etc/yum.repos.d/rpmfusion-nonfree.repo 
           1: rpmfusion-nonfree ~ https: //mirrors.rpmfusion.org/metalink?repo=nonfree-fedora-$releasever&arch=$basearch
           Active yum repos in: /etc/yum.repos.d/windscribe.repo 
           1: windscribe ~ http: //repo.windscribe.com/fedora/
Info:      Processes: 219 Uptime: 11d 14h 36m Memory: 7.58 GiB used: 2.79 GiB (36.8%) 
           Init: systemd v: 243 runlevel: 5 target: graphical.target Compilers: gcc: 9.2.1 
           clang: 9.0.0 Client: Unknown python3.7 client inxi: 3.0.37
calestyo commented 4 years ago

Well... if you never want such bugs to be ever fixed, then yes.

I guess it's quite unlikely that distro maintainers have the deep code insight (and manpower) to get such bug (which is obviously probably nothing easy to find & simple to fix) fixed. Sure you can argue I'm not running the most recent version now, and I could wait for it to be packaged in Debian, by which time an even newer would be out upstream and you'd argue the same.

I've seen several reports (some obviously quite outdated and probably obsolete) around here and other locations, where people tell about temperature or CPU utilisation issues,... all of them are probably difficult to reproduce and fix... as they may only occur in certain CPU/GPU combinations.

In my case the strange thing is, that no really high visible CPU utilisation is shown for the cinnamon process, yet the temperature still goes up like mad.

And yes, it's likely also something in the kernel, as 5.2 is way better than >5.3 ... but still even on 5.2 GNOME classic + fullscreen video leads to temperatures around 65-70 ... while with cinnamon it's >90.

I've played around with settings I've found, like VBlank stuff, or disabling compositing for fullscreen windows... but non of them seem to change anything.

So if anyone has some advise where to even start debugging... I'd be pretty grateful

claudiux commented 4 years ago

@calestyo Try this, without guarantee of success:

  1. Edit as root /etc/default/grub and replace the line GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" by GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable quiet splash".
  2. Run: sudo update-grub
  3. Reboot your computer.
calestyo commented 4 years ago

I'm still in the process of making some more extensive measurements (will post here once finished)... but so far it looks like this:

but stay tuned...

leigh123linux commented 4 years ago

I'm still in the process of making some more extensive measurements (will post here once

  • gnome-classic runs much cooler

To make the test fair use the full gnome-shell session as classic is much lighter load than cinnamon.

calestyo commented 4 years ago

To make the test fair use the full gnome-shell session as classic is much lighter load than cinnamon.

I'd do so if you strongly insist, but I think this is not necessary respectively doesn't tell us anything useful for several reason:

This is anyway not about Cinnamon bashing and complaining that it's a bit more resource hungry in idle state... I truly wouldn't care about this it's rather about the following two issues:

but further it seems pretty likely that there is also some (additional) issue on the Cinnamon side, cause even when staying at 5.2, video playback seems to be dramatically more resource intensive on Cinnamon than it is with GNOME classic. And this shouldn't happen, even with your argument above (that GNOME classic would be lighter)... cause playing a video should be more or less the same for both (regardless of any further base stuff that Cinnamon might to more in the background than GNOME Classic does).

[0] https://lore.kernel.org/lkml/d05aba2742ae42783788c954e2a380e7fcb10830.camel@scientia.net/

calestyo commented 4 years ago

The following is and excerpt from https://lore.kernel.org/lkml/c7b7e81b14380709c3d63033b0e67ee12b737b55.camel@scientia.net/ ... a bigger test series where I compare kernel 5.2 vs. 5.4 (each with intel_pstate=disable and without), each on Cinnamon and GNOME Classic... under different scenarios (idle system and several videos played back).

My personal conclusion would be that something changed between 5.2 and 5.3, which made temperatures and CPU utilisation considerably worse for Cinnamon,... and not such much, but still noticeably for GNOME.

Apart from that however, there seems to be additionally something wrong with Cinnamon, as it performs much worse with video playback than GNOME does - even under 5.2.


So I did some more systematic testing between the following:

each with

For each combination I've recorded:

during the following scenarios:

as well as:

with mpv, each in:

In the two modes where -vo=xv wasn't given, mpv selected: VO: [gpu] … vaapi[nv12]

so these were with VAAPI acceleration.

Other versions/etc. were:

For the testing, the notebook was placed on a metal surface (which probably explains why the temperatures are a bit lower than what I've reported in previous mails).

After each measurement (which often caused high CPU temperatures) I let the CPU/system cool down for ~5 minutes until it reached the initially measured "idle temperature" again. Also for each measurement, I let the system in that state (e.g. being idle or playing a video) for several minutes.

I've made the "average" values manually, for the temperatures those should be quite accurate, for the CPU utilisation they should be regarded more as a guide, since there were often spikes in on or the other direction.

Of course, I took always the same 2 videos, and started them at the beginning for each measurement.

Legend: C = Cinnamon G = Gnome Shell, classic mode

idl = idle (i.e. just desktop environment running, not interaction or other intensive processes for several minutes) loV = low-res video (h264 (High) (avc1 / 0x31637661), yuv420p, 720x396) hiV = high-res video (h264 (High) yuv420p(tv, bt709, progressive), 1920x1080

no "fs" = no fullscreen fs = fullscreen

no "xv" = mpv used [gpu] … vaapi[nv12] xv = mpv used xv

CPU temperature / [Cinnamon|Gnome Shell CPU%] / X CPU%

        5.2+a-hwp   5.2+disable 5.4+a-hwp   5.4+dis
able    
C idl       48/ 5,0/ 1,0    48/ 3,0/ 1,0    59/ 4,0/ 1,0    56/
2,7/ 0,8
C loV       53/15,0/ 6,0    54/ 9,3/ 4,3    62/14,0/ 6,3    59/
9,0/ 4,3
C loV-fs    75/26,0/ 7,0    75/17,5/15,0    92/25,0/ 7,0    89/17,3
/ 5,0
C loV-fs-xv 72/28,0/ 8,6    71/15,3/ 5,5    92/27,0/ 8,7    91/15,5
/ 5,5
C hiV       63/25,0/12,0    61/15,0/ 8,3    63/24,0/12,2    66/14,5
/ 8,5
C hiV-fs       100/45,0/10,0    95/33,0/ 7,6   100/41,0/11,0    97/34,0
/ 8,0
C hiV-fs-xv    100/27,0/27,0    95/33,0/ 7,5    97/24,0/18,0    97/22,0
/17,0

G idl       46/ 1,5/ 1,5    44/ 1,3/ 0,9    50/ 1,3/ 0,8    49/
1,2/ 1,1
G loV       49/12,5/ 6,8    48/ 7,3/ 4,3    51/11,0/ 6,6    52/
7,4/ 4,2
G loV-fs    56/13,2/ 7,0    55/ 8,5/ 4,5    60/13,3/ 6,7    60/
8,3/ 4,3
G loV-fs-xv 54/12,2/ 8,6    54/ 6,6/ 5,3    58/12,0/ 8,1    57/
6,5/ 4,7
G hiV       53/24,0/12,5    54/13,7/ 7,5    55/23,5/12,5    55/14,7
/ 7,5
G hiV-fs    93/28,0/13,4    93/15,0/ 7,3    96/27,0/11,1    95/16,0
/ 8,0
G hiV-fs-xv 94/ 9,5/20,0    91/18,5/ 8,8    97/17,0/ 8,7    93/21,0
/ 9,0

Conclusions:

...that is, at least on my hardware ;-)

Unfortunately all this is not limited to playing back videos...

So my suspicion would be something is wrong at the graphics stack and/or how it's used especially by Cinnamon.

Also (and I've tested the following only in Cinnamon), as previously noticed, sometimes, but not always:

Any help or pointers on how to debug this further would be highly appreciated... obviously running a notebook at ~80° or not being able to upgrade the kernel is kind of a showstopper.

Attached are several logs which may be useful.

Thanks, Chris.

[0] suggestion to try this from https://github.com/linuxmint/cinnamon/issues/9085#issuecomment-570654676

calestyo commented 4 years ago

btw: I've also tried to run Cinnamon with another window manager (i.e. metacity --replace or so)... but that seems to effectively end cinnamon... can it be run with mutter (i.e. to try whether that changes anything to muffin?)?

leigh123linux commented 4 years ago

Have you tried throttled?

https://github.com/erpalma/throttled

calestyo commented 4 years ago

Hm not yet.. I could give it a try, though it's changes seem pretty invasive.

Also AFAIU this tool actually resets the power limits and temperature maximum... so it would rather lead to a even higher temperature (in case the CPU has been already throttled down).

Using the tool the other way round also doesn't sound like a solution... sure I can throttle the CPU to some ultra low frequency and it will run colder, but then I could also just pull the plug.

What do you think about my test series and conclusions above?

To me it still seems like two things happen here:

Could it be that the drawing/rendering is done somehow different by Cinnamon which causes that?

So either this is purely a regression in the kernel... or it's some unfortunate way of Cinnamon rendering things or using something, which is not expected or intended by the kernel.

calestyo commented 4 years ago

Are there any other "tuning" options in Cinnamon that could have an effect on the GPU?

I'd expect that the animations for window maximising/etc. don't really hit me here, since none of them happen during my tests.

In the control centre / General I've set VSync method = none (which I think is the more performant setting?) and Disable compositing for full-screen windows = off (but I've checked it with on, and it didn't improve things.... plus as I've said with Cinnamon+5.4 temperatures already go crazy to ~70°C by just constantly moving around the mouse pointer, so it doesn't seem to be a fullscreen-only issue).

So anything else which I've could try in terms of Cinnamon/Muffin/etc. settings?

calestyo commented 4 years ago

I've just made the following "experiment":

Runnning 5.4 (I've upgraded to a newer minor version 5.4.8), intel_pstate not disabled (i.e. active/hwp) and cinnamon.

No other noticeable processes were running, especially none of the cinnamon related ones (cinnamon-screensaver, csd-* processes, seemed to have caused any CPU load,... and they shouldn't produce any GPU load either).

So I guess this is another indicator that in addition to the kernel changes from 5.2 to 5.3, something happens on the cinnamon side.

leigh123linux commented 4 years ago

I've just made the following "experiment":

So I guess this is another indicator that in addition to the kernel changes from 5.2 to 5.3, something happens on the cinnamon side.

Perhaps it's debian old packages causing the issue, to eliminate that try another distro.

https://spins.fedoraproject.org/cinnamon/download/index.html

calestyo commented 4 years ago

Checking that will take me some longer time... it's my university notebook and I cannot just install something else on it for policy/security reasons. But I've asked Debian Cinnamon maintainers whether there's any chance that they package the current upstream version soonish.

In the meantime I've had reported the issues against the intel kernel driver: https://gitlab.freedesktop.org/drm/intel/issues/953

And it was marked a duplicate there. So apparently there is some regression in 5.3 caused by: drm/i915/gen8+: Add RC6 CTX corruption WA (d4360736a7c0a6326e3bbdf7d41181f6ed03d9a6) which is actually a security fix. So there's some hope that it might get fixed sooner or later.

Still, this alone doesn't effect why Cinnamon is hit so much more by the issue in 5.4, and while it still performs considerably worse under 5.2... it runs still ~10°C hotter without the kernel regression.

So I'd like to keep this issue (or if you wish: open another one) where one can have a look at that and perhaps improve Cinnamon :-)

Maybe it's "just" some inefficient buffer use or double re-drawing or whatever, which does not occur with e.g. gnome shell or metacity.

calestyo commented 4 years ago

FYI: The issue still persists with the current cinnamon (4.4.8) and also with a kernel (5.5.13) that contains a fix for the previously suspected issue that prevents the GPU from going into RC6 sleep state.

So there may be still another issue in the kernel (which made it much worse after 5.2)... but there likely still is something wrong in cinnamon, too, as it runs generally on tremendously higher temperatures as other desktop environments,... especially but not limited to when playing videos.

Kallys commented 4 years ago

Thanks a lot @calestyo for your report, tests and informations.

I'm experiencing same kind of issue, not sure it's related but I've same symptoms with this config:

CPU: Intel(R) Core(TM) i7-8750H CPU
GPU: Nvidia GeForce GTX 1060 Mobile (nvidia-driver-440 version 440.64-0ubuntu0~0.18.04.2)
OS: Linux Mint 19.3 Tricia
Desktop environment: Cinnamon 4.4.8
Kernel: 5.3.0-*
Firmware: 1.173.16

When I say I do have same symptoms: I have lag when scrolling full text web page (using firefox), I have a lot of mce logs about CPU throttling, my CPU temperature is more than 10°C above normal while computer is idling (with cinnamon that may use 20%-30% CPU usage!). But I didn't check temperature while playing a video.

However, since few days (maybe when I updated linux-firmware to 1.173.17 ?) my laptop fans seems more quiet and I temperature appears more reasonable.

As explained in https://github.com/linuxmint/cinnamon-spices-applets/issues/2940 (thanks @claudiux for notifying this probable relation), I noticed too that there is a weird behaviour when polling CPU temperature (keep polling CPU temperature seems to decrease it...). I also realized right now that when starting xsensors, first temperature shown is 64°C for CPU, and immediately after it fall down below 50°C (at next refresh), while watching for sensors show directly a temperature below 50°C... Moreover, while both are running, they show the same temperature.

For information, I also reported this issue to https://bugzilla.kernel.org/show_bug.cgi?id=204893#c10 , since it seemed to me to be related although I don't understand at all what is actually happening.

calestyo commented 4 years ago

@Kallys Difficult to say whether it's the same or not.

Just yesterday I've finished a new extensive test series comparing 5.2 with 5.5, see: https://lore.kernel.org/lkml/ce8097694ddfab616616f8f81521495d99c74416.camel@scientia.net/T/#u respectively https://gitlab.freedesktop.org/drm/intel/-/issues/953

The regression in i915 I've mentioned above (https://github.com/linuxmint/cinnamon/issues/9085#issuecomment-572783044), causing the GPU not to enter RC6 sleep states has been fixed in 5.5 ... and since you have an nvidia GPU cannot have caused any troubles in your case.

But my tests still show that something must have changed after 5.2, which cause things to use much more power (and thus more temperature). In most of my test cases, using intel_pstate (which is the default) made it even worse (so you may try whether intel_pstate=disable improves anything for you).

Apart from that, it also seems that cinnamon is still way more affected than e.g. gnome-shell-classic.

I cannot definitely say it's a bug in cinnamon, though I wouldn't rule it out either. Cause even under 5.2 cinnamon runs already much hotter than gnome-shell-classic does, while it hasn't really more applets or anything else, that would explain this.

Could also be that cinnamon just uses some other techniques, which already perform worse in 5.2 and got tremendously worse in >5.2 ...

@leigh123linux You might want to have a look at that too, and also at my repo with test results and many plots which show how cinnamon compares to non-cinnamon.

Thanks.

Kallys commented 4 years ago

Actually, my laptop has two GPU, nvidia and i915 (but I saw no difference in CPU temperature while enabling one or the other)

VGA compatible controller: Intel Corporation Device 3e9b
   Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1215
   Kernel driver in use: i915
   Kernel modules: i915
VGA compatible controller: NVIDIA Corporation GP106M [GeForce GTX 1060 Mobile] (rev a1)
   Subsystem: Micro-Star International Co., Ltd. [MSI] GP106M [GeForce GTX 1060 Mobile]
   Kernel driver in use: nvidia
   Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
calestyo commented 4 years ago

Some updates from my side:

In addition to the lkml thread and the ticket at intel-drm there's now also one in the kernel bugzilla (effectively identical to the lkml thread so far.

calestyo commented 4 years ago

The following is only a preliminary test of mine (I need to make it more systematic and reproducible)... what I did was run: timeout 60 strace -p $(pidof cinnamon) -c ; beep under different workloads.

idle (i.e. no major processes like Firefox or so run, just the bare cinnamon stuff and some minor system daemons) with an idle temperature at around 55°C (which is quit extreme IMO):

strace: Process 3018 attached
strace: Process 3018 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 33,39    0,032259           6      5102       176 ioctl
 16,19    0,015643           3      4008      3358 recvmsg
 13,38    0,012928           6      2138           poll
  9,85    0,009515           8      1149         9 futex
  7,47    0,007220           7       955           writev
  3,27    0,003162           2      1074           getpid
  2,73    0,002635          18       144           munmap
  2,67    0,002576           8       319           write
  2,28    0,002203          18       121           openat
  1,82    0,001755          14       120           readlink
  1,81    0,001752          13       128           mmap
  1,42    0,001371           7       181           close
  1,02    0,000987           8       122           fstat
  0,84    0,000814           3       210           read
  0,76    0,000739          12        60           timerfd_create
  0,52    0,000504           8        60           timerfd_settime
  0,35    0,000337           0       512           mprotect
  0,23    0,000226           1       124           getrusage
  0,00    0,000000           0         1           madvise
  0,00    0,000000           0         1           fcntl
  0,00    0,000000           0         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0,096626                 16530      3543 total

moving a terminal window in circles (i.e. I press&hold the mouse button, and start circling):

strace: Process 3018 attached
strace: Process 3018 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 31,31    0,307751           4     76051     53171 recvmsg
 25,41    0,249797           6     41020      3483 ioctl
 20,15    0,198059           4     42340           poll
 12,86    0,126380           8     15054           writev
  2,88    0,028323           1     20308           getpid
  2,02    0,019876           4      4032           read
  1,88    0,018518           5      3627           write
  0,92    0,009092           3      2494           mprotect
  0,89    0,008786           7      1114        18 futex
  0,54    0,005317          10       488       216 openat
  0,23    0,002303          13       166           munmap
  0,20    0,001979          10       195           mmap
  0,18    0,001725           5       332           close
  0,16    0,001592           5       316           fstat
  0,14    0,001369          11       120           readlink
  0,06    0,000638          10        60           timerfd_create
  0,05    0,000488           4       108           lseek
  0,05    0,000487           8        60           timerfd_settime
  0,03    0,000264           2        90        79 stat
  0,02    0,000182           2        72           getrusage
  0,01    0,000084           1        44           fcntl
  0,00    0,000001           1         1           restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0,983011                208092     56967 total

starting the epiphany webbrowser opening some tabs, scrolling up/down on some rather simple websites, atl-tab-switching between windows):

strace: Process 3018 attached
strace: Process 3018 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 26,70    0,315785           6     51508     38555 recvmsg
 23,94    0,283104           9     29143      2544 ioctl
 21,42    0,253301           7     34676           poll
 16,49    0,195058          11     16656           writev
  5,01    0,059292           3     15403           getpid
  1,92    0,022686           8      2781           write
  1,50    0,017788           6      2692           read
  0,84    0,009938           7      1285        13 futex
  0,78    0,009213           2      4198           mprotect
  0,26    0,003104          11       269           munmap
  0,26    0,003031          19       157         2 openat
  0,20    0,002342           9       246           mmap
  0,16    0,001934          16       120           readlink
  0,15    0,001815           7       255           close
  0,12    0,001449           7       183           fstat
  0,06    0,000660           4       142           getrusage
  0,06    0,000655          10        60           timerfd_create
  0,04    0,000507           8        60           timerfd_settime
  0,02    0,000278           9        30           fcntl
  0,02    0,000222           1       182       158 stat
  0,01    0,000123           3        36           uname
  0,01    0,000105           3        33         6 recvfrom
  0,01    0,000075           3        21           lseek
  0,01    0,000074           3        20           sendmsg
  0,00    0,000056           2        20           ftruncate
  0,00    0,000056           2        20           memfd_create
  0,00    0,000004           2         2           pwrite64
  0,00    0,000004           2         2         2 mkdir
------ ----------- ----------- --------- --------- ----------------
100.00    1,182659                160200     41280 total

Now I wonder whether these high number of calls of recvmsg, ioctl and poll are expected, especially also their high error rates.

Could someone possibly repeat those simple tests and provide me with some rough number for comparison? @leigh123linux perhaps?

Thanks in advance, Chris.

dans20171 commented 4 years ago

So, funny story. I had linux cinnamon installed on my laptop (i7 3612QM, GTX 650M), but never really used it. One of the reasons was that it was running at least 10C hotter and battery lasted and hour less than on windows. Now I have more use for Linux and decided to find a solution (easy stuff like TLP did not help). From what I understood is that this problem is somewhat common for Cinnamon and general recommendation in such case is to install XFCE instead. But that is for the weak! Long story short, I found myself compiling the Kernel to update from version 5.3.0 to 5.6.5. I am somewhat a beginner and obviously something somewhere went wrong. Terribly wrong, I cannot launch Linux from the new kernel and when when launching the system from the old 5.3.0, I cannot install or uninstall anything, gives me a bunch of errors, fills up completely the /boot drive, etc. I might try to fix it, but it seems that a clean install is the rational thing to do. HOWEVER, my laptop now runs now as cool as Windows 10 with similar battery life!! Therefore, problem on Linux Cinnamont can be solved! Only clues I can give is that I did follow this guide to try to solve the problem with missing firmware error https://askubuntu.com/questions/811453/w-possible-missing-firmware-for-module-i915-bpo-when-updating-initramfs And that Initramfs has also been mentioned in the error lists. Good luck!

calestyo commented 4 years ago

It's rather unlikely that my problem comes from a firmware loading problem (since the right firmware is actually loaded)... also these warnings do not necessarily mean anything.