mobile-shell / mosh

Mobile Shell
https://mosh.org
GNU General Public License v3.0
12.7k stars 743 forks source link

High CPU Usage on FreeBSD 9/amd64 #297

Closed nvx closed 8 years ago

nvx commented 12 years ago

Installed via ports, mosh 1.1.3 exhibited low (under 1%) cpu usage of both the client and the server, however upon updating to mosh 1.2.2 cpu usage skyrocketed to a point where typing commands and viewing the output of say 'ls' was noticeably delayed, top reports cpu usage as 25% or so for both the client and server (on a core2 duo) and as high as 5% usage when idle.

This is most noticeable on lower-spec machines like Intel Atoms, but it was still quite noticeable on a Core2 Duo when you were looking for it.

Unfortunately I haven't tested any versions in-between 1.2.2 and 1.1.3 so I'm not sure where exactly it occurred. To compare, running "while true ; do ; ls ; done" in the same directory over ssh instead of mosh, sshd sits on around 7% cpu and you can't notice it updating where mosh peaks to 25%+ doing just a single ls. Directory in question has ~1000 files in it and colours are enabled for certain filetypes in case this has anything to do with it (ansi colour code processing mabe?).

keithw commented 12 years ago

The SGR (color) processing was broken on OS X and BSD until mosh 1.2 (because of different interpretations of POSIX), so I'm not sure there can be a direct comparison.

However, we did just push a commit that should improve CPU utilization on systems that have a slow clock_gettime(). Could you please try out the current git master? (Please make sure you're compiling with -O2 of course -- this is the default.)

nvx commented 12 years ago

Manually running mosh-server and the connecting to it with mosh-client on 127.0.0.1 (latest git pulled down moments ago) is still quite laggy on an Atom, haven't tested the core2 duo yet but cpu usage is sitting at 3.17% for mosh-server and 2% for mosh-client idle, and both spike up to ~25% when a 'ls' is executed in the same directory as the initial post, so not much change at all.

Let me know if you need any other information.

nvx commented 12 years ago

Actually, to quantify it a little I ran "time ls" and got the following

    0.31 real         0.00 user         0.04 sys

I then ran "unset CLICOLOR" and "time ls" again and got the following

    0.09 real         0.00 user         0.01 sys

The cpu usage was still rather high, and it wasn't as smooth as via ssh, but it was definitely a lot better, for comparasin with colour enabled, "time ls" over ssh (connected over loopback again):

    0.06 real         0.00 user         0.03 sys

and without colour

    0.03 real         0.00 user         0.00 sys

It doesn't look like much of a difference, but the 'feel' of how responsive it is is massive, it's the difference connecting to something on your local LAN vs across a dodgy 3G connection.

Edit: TLDR, colour is definitely making it slower, but there's something else at play as well.

keithw commented 12 years ago

I don't understand how running "time ls" is quantifying anything about the mosh server or client. That's just counting up the CPU resources expended by ls itself. You would have to look at the CPU usage of the mosh-server when given certain input to see if we are being unnecessarily slow. (Try saving the output of ls --color=yes to a file and then catting it back several times.)

Just to be clear, are you sure you compiled with optimization?

nvx commented 12 years ago

The reason why it's quantifying anything is that it's showing the real time taken to execute, which is considerably higher than the combined user+sys cpu usage - the reason it's taking longer I imagine is that writing to stdout is blocking as mosh catches up. So it's not the cpu time we're measuring and comparing, but the time it took to display the lines in real time. On the order of ~300ms to do a ls in a directory is definitely noticeably laggy, disabling colour resulted in the order of ~100ms, which was noticeably faster, but still not quite as smooth as ssh (but it's probably still usable) which clocked in the order of ~50ms with colour, or ~25ms without.

Quick summary of time taken to display with mosh vs ssh: ls with colour is 5x slower in mosh than ssh ls without colour 2x slower in mosh than ssh.

Timing the real time execution when using cat with files containing the output of colourised and non-colourised directory listings results in very similar findings (it's faster across the board, but mosh still weighs in slower than ssh by a similar factor) which is what would be expected.

Looking at top when running ls a couple of times in a row results in mosh-server and mosh-client each maxing out a cpu core.

As for optimizations, you said it is the default, Initially I built mosh from FreeBSD ports, and when checking out git I used the following configure (to make openssl)

OPENSSL_CFLAGS="-I/usr/include/openssl" OPENSSL_LIBS="-L/usr/lib -lssl" ./configure

And used no flags to make/etc. A quick make clean all to double check what params were passed to the compiler:

g++ -DHAVE_CONFIG_H -I. -I../.. -D_THREAD_SAFE -pthread -I/usr/local/include -Wall -fno-strict-overflow -D_FORTIFY_SOURCE=2 -fstack-protector-all -Wstack-protector --param ssp-buffer-size=1 -fPIE -fno-default-inline -pipe -g -O2 -MT transportinstruction.pb.o -MD -MP -MF .deps/transportinstruction.pb.Tpo -c -o transportinstruction.pb.o transportinstruction.pb.cc

Definitely looks like optimizations are enabled.

Note that the current tests are being done on a 1.6GHz Atom rather than a faster machine to make the differences more obvious, naturally on a faster machine it wouldn't be as noticeable. On the Core2 for example scping the same output of ls (with and without colour) from the Atom and catting it a few times doesn't noticeably lag, but mosh-server uses 18% cpu and mosh-client another 10% (again, using mosh to connect over loopback).

Compare to a Pentium 4 Ubuntu machine (running Mosh 1.1 - I don't control it unfortunately, I can build from git though if you'd like) cat'ing the same ls output files both the client and server barely scrape 10% usage, still slightly higher than ssh, but nowhere near as high as on a faster machine (Core2 Duo) running FreeBSD.

bdrewery commented 12 years ago

I also have this problem. Just by holding down 1 key, mosh-server reaches upwards of 60% CPU.

FreeBSD svn r237434 may help as well as it speeds up clock_gettime some on STABLE-9/CURRENT. I have not tried it yet myself.

ghost commented 12 years ago

FreeBSD used new future on ssh protocol

SSH (HPN-SSH) Web site patch = http://www.psc.edu/index.php/hpn-ssh Dynamic Autotuning Buffer pool on SSH transfer data Benchmark = http://www.psc.edu/images/psc/hpn-ssh/hpn-v-ssh-tput.jpg

bdrewery commented 12 years ago

1.2.3 is still CPU intensive. Every time I look at mosh-server, it is typically using 30%.

atmosx commented 12 years ago

Same here, reaches 35% and it's at least 5x time slower on RDC-chipset (x86 clone) system. Meaning that it's unusable.

phmarek commented 11 years ago

Just for reference, here's a simple "top" in mosh on a raspberry pi, with an interval of 1 sec:

top - 08:40:12 up 3 days, 20:55,  1 user,  load average: 0,02, 0,04, 0,05
Tasks:  62 total,   1 running,  61 sleeping,   0 stopped,   0 zombie
%Cpu(s): 47,7 us,  1,9 sy,  0,0 ni, 50,5 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem:    497764 total,   385596 used,   112168 free,    41780 buffers
KiB Swap:   102396 total,        0 used,   102396 free,   291908 cached
Change delay from 1,0 to
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
15877 pi        20   0 17112 4976 1404 S  46,4  1,0   0:14.10 mosh-server
15894 pi        20   0  4928 1420 1040 R   2,8  0,3   0:00.34 top

This is with 1.2.3-1 armhf.

keithw commented 11 years ago

Our suspicion is that the binary hardening features (which we compile with by default for security) are particularly CPU-intensive on ARM. We switched these on for mosh 1.2, which could explain some of the results people are seeing. Do your results improve if you compile with ./configure --disable-hardening ?

phmarek commented 11 years ago

Uh, I'm currently using the standard debian package. Please stand by.

phmarek commented 11 years ago

The configure option makes it oscillate between 15-18 and 40-56 percent CPU usage.

nvx commented 11 years ago

Might be an idea to split this into two separate tickets as one issue seems to be Linux+ARM while the other is FreeBSD+x86.

atmosx commented 11 years ago

Hello,

I compiled mosh using the --disable-hardening flag at FreeBSD's Makefile's option. I used ports to compile the package. The server still goes up to 20.50% when using regular shell commands. When using vim goes up to 30+%.

msh-rdcx86

As mentioned above, this was tested on an ebox-5530mx (RDC-based chipset/x86 clone) embedded device.

Best regards

nvx commented 11 years ago

I read that the hardening only affected ARM (it appears this ticket was slightly hijacked for high CPU usage on a Raspberry Pi which is Linux/ARM) so it's not surprising it didn't make much difference.

sean- commented 11 years ago

It might be useful to get a ktrace(1) of the binary's execution and then include the output of the kdump(1) so we could see what system calls are being executed.

I've seen problems with gettimeofday(2) taking a long time on broken hardware (i.e. Dell). Try looking at sysctl kern.eventtimer.choice and selecting different timers via (using LAPIC in this example) sysctl kern.eventtimer.timer=LAPIC. It wouldn't surprise me if the most advanced timer source is actually broken. Normally this exhibits itself as a high idle load average, but it's normally a kernel process that's polling the time source which causes the artificially high load. In this case, it might be mosh.

$0.02. -sc

On Jan 13, 2013, at 03:46 , NV notifications@github.com wrote:

I read that the hardening only affected ARM (it appears this ticket was slightly hijacked for high CPU usage on a Raspberry Pi which is Linux/ARM) so it's not surprising it didn't make much difference.

— Reply to this email directly or view it on GitHub.

Sean Chittenden sean@chittenden.org

nvx commented 11 years ago

Testing FreeBSD connecting to a Linux machine over an Internet link, mosh feels rather responsive, despite having 20%+ cpu usage on the client. Meanwhile connecting to localhost it's unusably sluggish. I take from this that the biggest issue is in the server, not the client (even if both exhibit similar cpu usage numbers).

Running a ktrace on mosh-server and just doing a few ls's the highest largest times were pselect - which is to be expected, it's a blocking call, followed by wait and fork (which only happens once anyway).

By the time we get to something that matters, we're at 0.000599 seconds which is a sendto(), 0.000524 for madvise() and getting even faster from there on out. There was quite a lot of calls to madvise() and clock_gettime(), often in short succession, but not enough to explain the sort of slowdowns (0.6s to do a simple ls over localhost vs 0.07s over ssh) seen.

I noticed in the kdump output that "ls" and the output of my ls appeared. Naturally the time between these two events would only encompass a part of the total time taken between typing ls and receiving the output, but it's a good enough indication for now.

http://pastebin.com/cqpjbmuG

Note there's almost 300ms delay between those two points, very little of which is spent within kernel syscalls. Rules out broken broken event timers at least. Good idea though sean!

sean- commented 11 years ago

Drat. I'd still see if tweaking the kernel.eventtimer source changes things. gettimeofday(2) doesn't get slow on broken systems, but they generally idle high or some programs will generate a higher than normal load that approaches 1.

crosses fingers You wouldn't by chance have dtrace enabled, would you? That's one of the best ways to figure out where kernel time being lost.

-sc

On Jan 13, 2013, at 22:04 , NV notifications@github.com wrote:

Testing FreeBSD connecting to a Linux machine over an Internet link, mosh feels rather responsive, despite having 20%+ cpu usage on the client. Meanwhile connecting to localhost it's unusably sluggish. I take from this that the biggest issue is in the server, not the client (even if both exhibit similar cpu usage numbers).

Running a ktrace on mosh-server and just doing a few ls's the highest largest times were pselect - which is to be expected, it's a blocking call, followed by wait and fork (which only happens once anyway).

By the time we get to something that matters, we're at 0.000599 seconds which is a sendto(), 0.000524 for madvise() and getting even faster from there on out. There was quite a lot of calls to madvise() and clock_gettime(), often in short succession, but not enough to explain the sort of slowdowns (0.6s to do a simple ls over localhost vs 0.07s over ssh) seen.

I noticed in the kdump output that "ls" and the output of my ls appeared. Naturally the time between these two events would only encompass a part of the total time taken between typing ls and receiving the output, but it's a good enough indication for now.

http://pastebin.com/cqpjbmuG

Note there's almost 300ms delay between those two points, very little of which is spent within kernel syscalls. Rules out broken broken event timers at least. Good idea though sean!

— Reply to this email directly or view it on GitHub.

Sean Chittenden sean@chittenden.org

nvx commented 11 years ago

I can do if needed - never used dtrace before but like that'll stop me. :P

That said, kernel time being lost? Do you think it's being lost in kernel time and not showing up in ktrace?

ghost commented 11 years ago
# sysctl kern.eventtimer.timer
kern.eventtimer.timer: HPET
sean- commented 11 years ago

I can do if needed - never used dtrace before but like that'll stop me. :P

dtrace... is like a scriptable version of ktrace, but OS-wide, not just the userland. It's pretty life-changing if you're used to debugging problems on Linux only.

That said, kernel time being lost? Do you think it's being lost in kernel time and not showing up in ktrace?

Can you repost your kdump(1) using -T? I'm pretty sure kdump uses relative wall clock by default, but I forget which clock it uses by default. As a creature of habit, I always use -T so I can compare things across systems.

kdump -di mosh-server ktrace -HT -f ktrace.out

-sc

Sean Chittenden sean@chittenden.org

sean- commented 11 years ago

sysctl kern.eventtimer.timer

kern.eventtimer.timer: HPET

Right. Now change it to something different based on

sysctl kern.eventtimer.choice

Something like:

sysctl kern.eventtimer.timer=LAPIC

HPET was selected because it has a weight of 450. Move down the list of available timers. Something like LAPIC or i8254. I'm amazed at how many broken hardware time implementations there are (though I wish FreeBSD wouldn't assume all advertised time sources were working and would automatically degrade after detecting a time source is broken). Anyway, hopefully this yields fruit for you.

-sc

Sean Chittenden sean@chittenden.org

nvx commented 11 years ago

Just for completeness I tried LAPIC instead of HPET and it made no difference.

I used kdump with -E which is timestamps since the beginning of the trace. Turns out by default it doesn't show timestamps at all. -T would have returned the same except with 1358100000 or so added on to each number. I don't have the dump available any more but I can re-do it if it matters that much and use -T, but it really shouldn't matter - I just find dealing with seconds-since-trace-start more useful than seconds-since-epoc for that kind of things.

keithw commented 11 years ago

I'm following this with great interest and am eager for somebody to track it down -- I agree 300 ms for an ls is unfortunate. I wonder if there is some slowness in some of the STL data structures we are using in the implementation that ends up getting pulled in.

nvx commented 11 years ago

FreeBSD uses the gnu stl libraries (FreeBSD 10 changes this, but that's not out yet and I'm running FreeBSD 9) so it'd be odd that it's not affecting Linux users if it is caused by that.

keithw commented 11 years ago

Well, just out of curiosity, what version of g++ are you using?

sean- commented 11 years ago

Soo... that actually brings out a good point. How about just some old school profiling?

setenv CFLAGS -pg env CFLAGS="-pg ${CFLAGS}" make ./mosh-server ^C gprof

FreeBSD is stuck with gcc 4.2 due to licensing issues. -sc

On Jan 14, 2013, at 01:02 , NV notifications@github.com wrote:

FreeBSD uses the gnu stl libraries (FreeBSD 10 changes this, but that's not out yet and I'm running FreeBSD 9) so it'd be odd that it's not affecting Linux users if it is caused by that.

— Reply to this email directly or view it on GitHub.

Sean Chittenden sean@chittenden.org

nvx commented 11 years ago

If you'd like I can hook one of you guys (keithw and/or sean) up with a shell on a FreeBSD 9.1-RC2 machine (I have to get around to updating it to 9.1-RELEASE) for testing purposes if you're like. It's a rather beefy 3.4GHz quad core i5, so mosh isn't unusably slow like on the Atom, but it does still reach 20% CPU usage with ease, so the problem is still very much there.

Might be a little quicker than the back and forth.

phmarek commented 11 years ago

I've got a -pg profile on RPi for you. Should I send it per private mail?

89kB with LZMA compression.

nvx commented 11 years ago

Can a separate issue be created for RPi issues as it's a different CPU architecture and different kernel than this issue is about, so I highly doubt they are related.

phmarek commented 11 years ago

I don't think that it's a separate issue, but here you are.

nvx commented 11 years ago

Linux in general is unaffected, so it sounds to be ARM-specific. Meanwhile this bug is about FreeBSD/amd64, and amd64 in general is unaffected, so this sounds to be FreeBSD specific. The issues could be related in some way, but they are in fact two separate issues and deserve dedicated efforts to solving both problems.

aaronky commented 11 years ago

I'm having the same issue with the net/mosh port 'mosh-1.2.3_2' on FreeBSD 9.1-RELEASE-p1 i386 gcc version 4.2.1 20070831 patched [FreeBSD] hw.model: Geode(TM) Integrated Processor by AMD PCS (Soekris net5501-60)

screen shot 2013-05-01 at 11 33 38 pm

I'm going to bump another machine up to 10 on amd64 and see if that makes any difference.

vadipp commented 11 years ago

I'm having the high load problem on a mosh-1.2.4 server running on FreeBSD 9.1-RELEASE amd64 coupled with a mosh-1.1.94 client running on Cygwin. mosh-server goes up to 40% CPU when something is actively happening on the screen, and only shows ~0.6% usage when screen updates are rare. I can try to debug/trace this issue if needed with some help of a more experienced person here :)

keithw commented 11 years ago

What is the size of your terminal window, and what kind of CPU are you running mosh-server on?

(N.B. 1.1.94 was a debugging pre-release.)

vadipp commented 11 years ago

The size of the window is 157x76. The CPU is an Intel Core 2 Duo 3GHz.

(I was building the client part of mosh for Cygwin manually a while ago, and was lazy to upgrade since it works OK. I can upgrade if this can be of any help).

k-bx commented 8 years ago

Just wanted to mention that the problem is reproducible on latest ubuntu on scaleway arm servers (so it's easy to reproduce). Mosh has high CPU consumption and is in general slower than ssh.

cgull commented 8 years ago

I just this week committed the code discussed in #527 and followon issues. If you can use our Ubuntu PPAs or build from source on master, please try that.

k-bx commented 8 years ago

@cgull ppa doesn't have anything new, but building from source seemed to solve CPU issues, thanks! Will report if I'll notice anything bad.

andersk commented 8 years ago

@k-bx There are two PPAs. The new development is in ppa:keithw/mosh-dev.

cgull commented 8 years ago

Closing, please open a new issue if you still have performance issues.

nvx commented 8 years ago

Is this expected to fix the high CPU on FreeBSD as this ticket is for? Or only Ubuntu?

Would rather it get tested on FreeBSD (I'm not in a position to test from source at the moment, but can update once it's in ports easily enough) before this gets unceremoniously closed because an issue on Ubuntu + ARM got fixed (despite Ubuntu + x86 not being affected, while this issue is for x86 FreeBSD...).

cgull commented 8 years ago

FreeBSD and OS X are my primary development platforms; I did much of my profiling/analysis on FreeBSD. The changes are mostly algorithmic and make significant improvements on all platforms; FreeBSD shows more relative improvement than Linux (but was slower to begin with).

nvx commented 8 years ago

Ah very good. Initial reading of the pull request was unclear. I look forward to 1.2.6 which I assume will contain these improvements! :+1: