mattermost / desktop

Mattermost Desktop application for Windows, Mac and Linux
Apache License 2.0
2.02k stars 829 forks source link

CPU use on both Linux and Windows clients #997

Closed Kabouik closed 2 years ago

Kabouik commented 5 years ago

I confirm (by marking "x" in the [ ] below: [x]):


Summary The CPU use of the Mattermost desktop client seems way higher than it should on Linux and Windows (also reported on Mac in mattermost/mattermost#24689, and Windows and Linux in the said issue by myself with screenshots). Another example:

Screenshot from 2019-06-25 15:58:56

The screenshot was taken just after starting Mattermost, but the same statistics can persist hours later.

I have been having this issue with the Mattermost Linux client version 4.1.2 as well as previous versions in the past, and so did one of my users who uses the Windows client. I have posted in mattermost/mattermost#24689 to give details on our observations on both platforms.

However, issue mattermost/mattermost#24689 seems to be specific to Mac and has been related to animated gifs on that case, so it may be a different problem. There are several other issues on CPU use here, but most have been closed due to inactivity rather than real solutions. Therefore, I believe it is better to open another issue to show that the problem happens on other platforms than Mac, regardless of animated gifs and HTML previews.

Environment

Steps to reproduce Hard to tell: Mattermost does not always show high CPU use like on the screenshot above, but it is fairly common (several times a week) and will last even after restarting the application. I failed to identify anything in the conversations that correlates with the CPU use. Not sure what stops it either. I initially thought it would be animated gifs or HTML previews after reading mattermost/mattermost#24689, but I observed that there was no relation.

Expected behavior Lower CPU use.

Observed behavior Sometimes, all the CPU resources available are used.

svelle commented 5 years ago

I'm observing the same behavior on Fedora 29. Attached a screenshot as well.

Desktop client version is 4.2.1 Happens across all server versions. Screenshot from 2019-06-25 15-52-30

amyblais commented 5 years ago

Ticket for reference: https://mattermost.atlassian.net/browse/MM-16626.

lig commented 5 years ago

This is still happening with 4.2.1 on Windows and 4.2.3 on Linux.

lkraav commented 5 years ago

Confirmed, Linux 4.2.3 constantly hogs ~4% of CPU and making laptop fans spin, which means something incorrect inside Mattermost is running in a constant polling loop of some kind.

Feels like a regression from 4.2.0, because I'm monitoring for this type of stuff pretty actively, and don't think I noticed that earlier.

Slack v4 client seems to have made massive improvement strides with this, as it now seems to be nearly idle when sitting in the background.

Would love to get that kind of performance out of Mattermost client as well.

hunterlester commented 5 years ago

I'm wondering if this is not an issue at all with the desktop application but rather with the webapp, which should be very encouraging.

I cloned the master branch of desktop and observed upon launch that it will idle nicely at 0.0 CPU usage. I add server web URL: https://chat.redox-os.org. Only upon logging in to server do I see a jump in CPU usage, idling between 6% to 8%.

By comparison, I open a single tab in Chrome, idling at 0.0 CPU usage. I navigate to https://chat.redox-os.org. Upon logging in I see a jump in Google Chrome process, idling between 6% to 8%.

I'm testing on macOS Mojave 10.14.6. I intend to also test on Windows and Linux, when I get a chance...

Next step is to dive into webapp codebase. In particular I want to measure websocket packets to look at the size and what kind of IO work is being performed on reception of packet.

lkraav commented 5 years ago

@hunterlester I'm currently running 5.13.1 webapp in Firefox 69 as daily driver, and it does not show similar CPU hogging symptoms. But I'll continue to keep an eye on this.

hunterlester commented 5 years ago

@lkraav For the firefox process what is it's rough idling CPU usage when signed into server?

I also see around 6% CPU usage in webapp on Firefox once logged into server, however, interestingly the one difference is that when the Firefox window is hidden, CPU usage does return to 0.0, unlike Chrome.

Kabouik commented 5 years ago

Thanks for digging in that direction. I also failed to reproduce it with the webapp, but as mentioned in the first post, I did not thoroughly test the webapp. I edited the first message to reflect this.

Your observation of a 6% CPU use seems quite good though compared to the more than 60% (or even more than 80% sometimes¹) in my screenshot (on an i5 7200U CPU), albeit already high for a chat application.

¹ : It is also interesting to note that this is quite variable, from low CPU usage to huge spikes, or even long periods of high CPU usage, all without noticeable differences in the content of the channels.

hunterlester commented 5 years ago

Quick update, I've just tested in the same manner on Ubuntu 18.04.3.

@Kabouik What is your CPU usage before logging into server and then after logging into server? So far I observe it consistently increases upon logging in and accessing webapp, whether through Electron or Firefox, or Chrome.

I'll be testing on WIndows as well, hopefully tonight...

hunterlester commented 5 years ago

Just tested on Windows and observing same behavior as my tests on Linux and macOS. Of course I'm missing something. Still think it's a server/webapp issue and not specific to this desktop app.

I happen to be logging into https://chat.redox-os.org, which is not a very active server. My next tests will be while connected to the Mattermost community server.

To help narrow down this issue I'll be logging web socket events as well as packet payloads in the webapp. I also want to find out if there's any additional data being sent over the server-to-client heartbeat.

lkraav commented 5 years ago

Confirmed, my Linux client is definitely a CPU hogger

image

Nothing visibly exciting going on in the active chat window, just some text and static images pasted.

hunterlester commented 5 years ago

Update: Not enough web socket events to account for the consistent CPU usage.

Next is tracing sys calls. I'll need help understanding as this is new ground for me.

System rebooted with csrutil disabled.

Running dtruss -s -ap <Electron PID> for 60 seconds.

CALL                                        COUNT
__disable_threadsignal                          2
bsdthread_create                                2
bsdthread_terminate                             2
getrlimit                                       2
proc_info                                       2
thread_selfid                                   2
mprotect                                        7
madvise                                        38
psynch_mutexdrop                              105
psynch_mutexwait                              105
ulock_wait                                    158
ulock_wake                                    158
psynch_cvsignal                              2439
psynch_cvwait                                2441
read                                         3313
recvmsg                                      7701
write                                       15842
kevent                                      33923

Will read about kevent, to find out what effect it has on CPU. The write events I would think are significant. Simultaneously running iosnoop -p <Electron ID> but did not see IO activity corresponding to the number of read and write sys calls. I only observed 11 IO events performed by Electron.
I wonder if it’s because the IO events are coming from the web app, which is a sandboxed environment and must not directly write or read from files but indirectly through memory.

Example write thread stack:


 1768/0x212e:   8631296      28     26 write(0x19, "\030\0", 0x28C8)         = 8192 0

              libsystem_kernel.dylib`write+0xa
              Electron Framework`0x0000000104830a00+0x40f
              Electron Framework`0x0000000104830450+0xc7
              Electron Framework`0x0000000104830220+0x2c
              Electron Framework`0x0000000105aecd20+0xd6
              Electron Framework`0x0000000105af7220+0x4ac
              Electron Framework`0x0000000105aecf50+0x7f
              Electron Framework`0x0000000105a6b550+0x1a8
              Electron Framework`0x0000000105ab0640+0x1aa
              Electron Framework`0x0000000105aed410+0x57
              libsystem_pthread.dylib`_pthread_body+0x7e
              libsystem_pthread.dylib`_pthread_start+0x42
              libsystem_pthread.dylib`thread_start+0xd

By comparison, a single tab open in Firefox, accessing same Mattermost server, running dtruss -s -ap <Firfox PID>, also for 60 seconds.



Note: CPU usage between Firefox process and Electron process was comparable.


CALL                                        COUNT
fsync                                           1
getegid                                         1
geteuid                                         1
getgid                                          1
getsockname                                     1
getsockopt                                      1
kqueue                                          1
mkdir                                           1
unlink                                          1
ftruncate                                       2
rename                                          2
sendmsg_nocancel                                2
socketpair                                      2
sysctl                                          2
__disable_threadsignal                          3
bsdthread_create                                3
bsdthread_terminate                             3
getpeername                                     4
statfs64                                        4
close_nocancel                                  5
lseek                                           6
shm_open                                        6
connect                                         7
fstat64                                         7
socket                                          7
munmap                                          8
psynch_cvclrprepost                             8
sendto_nocancel                                 8
access                                         10
proc_info                                      10
select_nocancel                                10
getattrlist                                    11
getuid                                         11
kevent_qos                                     12
necp_client_action                             12
thread_selfid                                  12
gettimeofday                                   13
poll                                           13
madvise                                        16
recvfrom_nocancel                              22
open                                           27
pread                                          28
setsockopt                                     30
mprotect                                       32
close                                          42
sendto                                         42
select                                         47
psynch_mutexdrop                               51
psynch_mutexwait                               52
fcntl                                          65
stat64                                         67
mmap                                           71
recvfrom                                      150
kevent_id                                    1289
workq_kernreturn                             1313
recvmsg                                      2469
psynch_cvsignal                              3868
sendmsg                                      4926
read                                         4948
write                                        4956
psynch_cvbroad                               6074
kevent                                       6155
psynch_cvwait                               19797


Saw many errored kevent logs due to invalid address.

Next steps are to research sys calls to understand effect on CPU. Which files are being written to and what is being written. iosnoop doesn't give me insight, so need to dig into Dtrace toolbox

hunterlester commented 5 years ago

Next step is to look up those Electron Framework'0x... addresses, with the goal of determining the functions which are being called, resulting in continuous write and kevent syscalls, which appear to be correlated with consistent CPU usage.

Logging thread stacks using dtrace -p <Electron Helper PID> -n 'syscall::write:entry {ustack()}'. Also looking atsyscall::kevent:entry`.

Note: It's not my intention to spam this thread. I'm documenting the process so that more experienced engineers can point out gaps in my knowledge and secondly for others to be able to learn if we're able to gain insight.

Thinking that ustack isn't able to resolve Electron Framework function symbol names as those may be defined and allocated at runtime. Given that Electron(Chromium) is based on V8, perhaps those functions don't exist at the time of process creation, so they won't be a part of the process's symbol table. So those addresses may be referencing runtime, heap allocated functions.

Think it's confirmed when taking core dump of the process and bt all in lldb:

    frame #0: 0x00007fff68bed78e libsystem_kernel.dylib`kevent + 10
    frame mattermost/desktop#1: 0x000000010e7fa969 Electron Framework`___lldb_unnamed_symbol124545$$Electron Framework + 89
    frame mattermost/desktop#2: 0x000000010e7f845d Electron Framework`___lldb_unnamed_symbol124506$$Electron Framework + 573
    frame mattermost/desktop#3: 0x000000010e7ee117 Electron Framework`___lldb_unnamed_symbol124239$$Electron Framework + 455
    frame mattermost/desktop#4: 0x000000010e76c6f8 Electron Framework`___lldb_unnamed_symbol122108$$Electron Framework + 424
    frame mattermost/desktop#5: 0x000000010e7b17ea Electron Framework`___lldb_unnamed_symbol123124$$Electron Framework + 426
    frame mattermost/desktop#6: 0x000000010e7ee467 Electron Framework`___lldb_unnamed_symbol124250$$Electron Framework + 87
    frame mattermost/desktop#7: 0x00007fff68ca62eb libsystem_pthread.dylib`_pthread_body + 126
    frame mattermost/desktop#8: 0x00007fff68ca9249 libsystem_pthread.dylib`_pthread_start + 66
    frame mattermost/desktop#9: 0x00007fff68ca540d libsystem_pthread.dylib`thread_start + 13

Or using lldb, nothing returns when running image lookup --address <hex memory address>.

How do I lookup symbol names for dynamically allocated functions? Thinking that I need to obtain a V8 heap dump then figure out how to resolve the core dump memory addresses.

UPDATE

Those continuous kevent and write syscalls are not contributing to sustained high CPU usage. They only account for 0.5% of CPU usage and are called at the same rate, whether or not logged into chat server. Back to the beginning to look for more clues.

yomguy commented 4 years ago

Hi, I have the same issue : the desktop app with 3 servers eats 15% of my i7 CPU doing nothing. It happens ONLY when the main window is opened and decreases to 1% when closed but with the notification icon in the taskbar (app running). As I have no change enabling GPU acceleration or not, I just suspect a pb / bug with the display, maybe related to Electron.

lkraav commented 4 years ago

I've just been using the MM web client in Firefox for the past few months to work around this issue.

mikebakke commented 4 years ago

I'm running on a Pi3 and in normal use is absolutely fine. However - as soon as I attempt to upload a jpg image > 1mb all 4 cores peg at 100% and the interface shows "processing". Tried by web and windows client so far. Smaller images are fine.

ok. Just ran a 930k image in and it took about 3 minutes to finish processing but it did eventually. Unsure what it's actually doing when "processing" but it's clearly resource intensive. Assuming it's memory as that is c. 1G on this machine.

artfulrobot commented 4 years ago

mattermost top

Using MM Desktop 4.3.2 on Linux (Ubuntu 19.10) is constantly using significant load. Before realising this I replaced my laptop battery, now I know!

lkraav commented 4 years ago

I'm looking forward to testing the freshly released 4.4.0-rc0 for this. Still using webclient on Firefox for now.

artfulrobot commented 4 years ago

Well, it's an improvement, I think, but I'm still going back to the browser client.

mattermost top2

lkraav commented 4 years ago

4.4.0-rc1 still continously hogs 6% CPU on Linux. Kills battery life, so AC-only.

hunterlester commented 4 years ago

@lkraav @artfulrobot Hi all, I'm not certain but I may have found one of the performance issues, however I can't rely on my own measurements. Really curious to know if you see any difference in these artifacts: https://circleci.com/gh/mattermost/desktop/5442#artifacts/containers/0

Base on this PR, in case you want to verify what you're installing.

hunterlester commented 4 years ago

Something else I want to confirm is what percentage of CPU usage the web app accounts for. What is the extra CPU usage in your Chrome/Firefox/etc process(es) when running MM?

That's an important comparison that's consistently missing in our measurements.

lkraav commented 4 years ago

Really curious to know if you see any difference in these artifacts: https://circleci.com/gh/mattermost/desktop/5442#artifacts/containers/0

Unfortunately there doesn't seem to be a difference.

Peek 2020-02-11 06-29

Something else I want to confirm is what percentage of CPU usage the web app accounts for. What is the extra CPU usage in your Chrome/Firefox/etc process(es) when running MM?

My observation has always been that webapp doesn't suffer from this cycle churn. Firefox has about:performance monitor available, MM tab most of the time sits completely idle. Perhaps there are more in-depth in-browser cycle monitoring tools available, not sure what they are, for now.

hunterlester commented 4 years ago

@lkraav Thank you for taking the time. I really appreciate it. I'm also noticing next to no CPU usage in Electron dev console CPU profiler.

So, my next line of investigation is through V8 profiler. Passing --js-flags="--prof" when executing electron to generate isolate CPU tick log files. Then processing with node --prof-process.

Hopefully leads to good clues:

   ticks  total  nonlib   name
     23    0.1%    1.2%  JavaScript
   1828    5.1%   91.6%  C++
     65    0.2%    3.3%  GC
  33626   94.4%          Shared libraries
    144    0.4%          Unaccounted

where most of CPU time is taken by /usr/lib/system/libsystem_kernel.dylib, according to bottom up profile.

I thought this was a dead end before when I used dtrace to observe constant write and kevent sys calls, which didn't appear to be utilizing any significant amount of CPU.

I could use more expertise here.

Leaving a trail so that a bunch of us can investigate.

lkraav commented 4 years ago

Thanks for the instructions. There are many isolate log files. How can I tell which might be relevant?

hunterlester commented 4 years ago

Ah yes, I'm passing --no-logfile-per-isolate as well as --prof to generate single file. I'm not certain but it seems like the number of isolates isn't insightful as far as Electron processes are concerned.

Doc on isolate class: https://v8docs.nodesource.com/node-0.8/d5/dda/classv8_1_1_isolate.html

Without that flag I was receiving 4 isolate log files and they each had similar profile summary.

Here's my single processed output: https://gist.github.com/hunterlester/0c387bccafa02ca1229be34bec24703e

vimmerru commented 4 years ago

We noticed that CPU usage is often caused by animated gifs placed in chats.

lkraav commented 4 years ago

We noticed that CPU usage is often caused by animated gifs placed in chats.

Correct. This issue is working on a more general problem.

Kabouik commented 4 years ago

We noticed that CPU usage is often caused by animated gifs placed in chats.

Not only. mattermost/mattermost#24689 is dedicated to CPU use caused by animated gifs, but I posted there results showing that there are CPU spikes even with channels containing no gifs, sometimes actually higher than channels with gifs being displayed in the same teams. This was one of the main reasons why I opened this issue: there are additional causes for (huge) CPU spikes that haven't been identified yet.

lkraav commented 4 years ago

This was one of the main reasons why I opened this issue: there are additional causes for (huge) CPU spikes that haven't been identified yet.

CPU spikes vs low but constant CPU churn are also two different issues :thinking:

Title "CPU use" is rather generic for now..

Kabouik commented 4 years ago

CPU spikes vs low but constant CPU churn are also two different issues

You are correct, bad wording in my previous message. I actually experience both CPU spikes and unusually high base CPU use (and others too, right?).

eldruin commented 4 years ago

I also experience constant 13-15% CPU usage by "Mattermost Helper (Renderer)" on macOS (MBP 2019) with the desktop client (4.4.0.5682). An additional 2% is constantly used by "Mattermost Helper (GPU)" while idle, even when the window is closed but the client is still running the CPU usage stays the same, which is quite counter-intuitive.

Deactivating GPU rendering reduces CPU usage by the Renderer process by about 4%.

Interestingly, the web client on Firefox (74.0, 64-bit) makes Firefox's CPU usage jump up by about 20% while the tab is opened. Switching to another tab returns the CPU usage to the average 5-7%, which is normal in my machine.

Mattermost server version: 5.20.1 Team Edition

svelle commented 4 years ago

Hey folks, thanks for all your reports. Another very important info for all reports here is the Mattermost server version. If you don't know how to get it, it's shown in the About Mattermost tab in your account settings (Hamburger Menu next to your username) image

A lot of the performance impacts obviously come from the webapp rendering, so having the server version helps us know where to look for. Sometimes there's already been some improvements in newer version so it's also worth looking into upgrading to a newer version (latest ESR/LTS is 5.19 and latest stable quality release is 5.21)

Kabouik commented 4 years ago

I got very bad CPU use with the client lately so I tried using Mattermost in a browser instead (not my main browser because I want a separate window). I tried Epiphany (webkit). I still get 50% CPU use on an i5 7th generation CPU, so it's not tied to the Electron application.

Server is 5.7.0, I admit I'm a bit late on that. I have no time to backup and upgrade right now but I will report back once it is done in case there is a significant change in CPU use.

lkraav commented 4 years ago

Server is 5.7.0, I admit I'm a bit late on that. I have no time to backup and upgrade right now but I will report back once it is done in case there is a significant change in CPU use.

I have upgrade 5.16 -> 5.21 and there's no difference in desktop client CPU usage.

BUT I did discover something else today: CPU churn apparently radically depends on the size of the channel history

Kabouik commented 4 years ago

That is something I investigated some months ago but I cannot confirm, on my end Mattermost can use 60% CPU even when showing a private channel with less than 10 messages in history.

frerich commented 4 years ago

@lkraav Great finding! I see the exact same thing -- a fresh, empty, channel reduces CPU usage to a perfectly acceptable minimum (<1% CPU in my case).

mparuzel commented 4 years ago

This has been observed on the browser front-end as well. Moderate CPU usage for chats with long history. Very little CPU usage on chats without history. Reproduced using Chrome.

Kabouik commented 4 years ago

My main Windows user is reporting that the issue arises mainly when images are posted in a channel, and just a few images are enough to get to 30-50% of CPU use. He does not know whether image URLs previewed in the channels and images uploaded to Mattermost directly cause the issue equally, we haven't checked.

I don't think the link is that obvious in Linux: as stated previously, I often observed even higher CPU use in channels with no visible images (do other channels not being displayed and past history use CPU too?). However, it's worth investigating in that direction because he is positive about the relationship, since he has to kill the Mattermost process every time I post 3 or 4 images.

Of note, there are no CPU issues whatsoever on mobile applications. Android works exactly the same whether images are being previewed or not. I use a third-party SailfishOS application, and no CPU issue either.

jowilkes commented 4 years ago

Getting the same issue on both Windows (7 pro 64, 7 ultimate 64, 10 home) and Linux (Ubuntu 18, 19, and 20). Mattermost desktop app ( or -snap app) causing 40-70% CPU load when idle, when hidden, when no messages are coming in in any of the channels. Slightly more load during actual activity. Edit: server version 5.9.0 for the one I just checked. Edit2: no animated gifs in any of the channels. still >50% CPU load.

jowilkes commented 4 years ago

CPU spikes vs low but constant CPU churn are also two different issues thinking or high constant CPU churn.

Kabouik commented 4 years ago

My main Windows user is reporting that the issue arises mainly when images are posted in a channel, and just a few images are enough to get to 30-50% of CPU use.

Note that he added that disabling URL preview and extended image preview in Account settings > Display does not help.

nazavode commented 4 years ago

@lkraav Great finding! I see the exact same thing -- a fresh, empty, channel reduces CPU usage to a perfectly acceptable minimum (<1% CPU in my case).

Same here, just switching to an empty channel makes Mattermost disappear from top. Mattermost desktop v4.5.0.6629 on macOS 10.15.5.

padeken commented 4 years ago

Hi, I just noticed that the corresponding jira ticket https://mattermost.atlassian.net/browse/MM-16626 was closed. I know this is difficult to trace down and debug, but shouldn't the issue be kept open?

amyblais commented 4 years ago

Thanks, re-opened,

Regul777 commented 4 years ago

It's funny, but I think I found the problem of idle cpu using. It's the CSS animation of the ".loading-screen .loading__content .round" elements. It's a spinner animation with 3 circles at the top of a channel (used for the chat infinity scroll). It is an infinity animation at an invisible area, but browser render it all the time. It's easy to check. After a loading any chat with a big list of messages we see 4-5% of CPU usage for the Render process. But if you remove the element(display:none) or just disable the animation of the .round elements the CPU of the process drops to 0.1-0.3% (be careful - a Developer console can cause CPU usage that can spoil the test). So the best it's to hide the element if we don't need it. I've made small few experiments, but it seems that the animation or the element must be removed (or hidded by display:none) to solve the problem.

frerich commented 4 years ago

I can confirm that setting display: none on the element matching the selector div.loading-screen greatly helps with the CPU usage of the Mattermost 4.5.0.6629 client on macOS.

lkraav commented 4 years ago

@Regul777 you deserve an award of some kind :rocket: :partying_face:

I can confirm @frerich experience: CPU use instantly drops to 0.

lkraav commented 4 years ago

I'm not sure when 4.6.0 is planned to be released, but if this hotfix was patched up into 4.5.~2~3 asap, I think nature would sincerely appreciate it.

Willyfrog commented 4 years ago

I've created a jira bug after some investigations. I can confirm this behaviour both in desktop and firefox.

Please, be aware that when this change is merged it will require a server upgrade, not a desktop upgrade as it is not related to desktop source code.

Also, once it is merged I'll keep the issue open, as I'm sure there will be other factors taking up cpu usage