poelzi / ulatencyd

daemon to minimize latency on a linux system using cgroups
GNU General Public License v3.0
243 stars 30 forks source link

No buffer space available + High CPU load #47

Open V10lator opened 10 years ago

V10lator commented 10 years ago

Hi, I'm not 100% sure these two events are connected but sometimes my log is spammed with

ulatencyd[542]: \ (ulatencyd:542): WARNING **: failed to get data: Error receiving data: No buffer space available

and the ulatencyd process eats up 100% of one core. No, my RAM is not full (not even close) when that happens.

gajdusek commented 10 years ago

Út 4. únor 2014, 09:53:07 CET V10lator notifications@github.com napsal(a):

Hi, I'm not 100% sure these two events are connected but sometimes my log is spammed with

ulatencyd[542]: \ (ulatencyd:542): WARNING **: failed to get data: Error receiving data: No buffer space available

and the ulatencyd process eats up 100% of one core. No, my RAM is not full (not even close) when that happens.


Reply to this email directly or view it on GitHub: https://github.com/poelzi/ulatencyd/issues/47

Hi,

I have not used nor work on ulatency for long time. But I plan to dive into it again and I will look into this issue too. IIRC those messages may come from netlink connection, e.g. if processes are being spawned or forked too quickly, but I am not sure.

Can you please post version of your ulatency and its logs?

Petr

V10lator commented 10 years ago

@gajdusek

$ emerge -pv ulatencyd

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   #] sys-process/ulatencyd-0.5.0::powerman  USE="-doc -qt4" 0 kB

Where do I find its logs? Does journalctl -u ulatencyd --no-tail -a > ~/ulatencyd.log serve you? If so: https://gist.github.com/V10lator/8863207

theradioboy commented 10 years ago

I am experiencing a similar problem. My /var/log/ulatencyd has a lot of

WARNING: failed to get data: Error receiving data: No buffer space available
WARNING: pid: 27355 parent 27354 missing. attaching to pid 1
WARNING: pid: 27355 parent 27354 missing. attaching to pid 1

This is on Kubuntu 12.04.4, ulatency package version 0.5.0-4ubuntu1. It doesn't get to use 100%, but I have noticed a small cpu increase probably associated to it (<10%).

gajdusek commented 10 years ago

Hi,

you both are using rather old ulatencyd 0.5. Can you compile the master branch? There are some related fixes and optimisations, which can really help (but probably not fix the issue entirely).

I will try to describe what you are experiencing:

Ulatencyd is being informed whenever a new process in system is spawned, or running one exits, forks or changes its UID or GUID. These notifications come from linux kernel netlink which ulatency listen for events. The event is then handled by ulatency to ensure fast reactions to changes.

The warnings you see are harmless as they just say that some netlink events were lost because of their high frequency. Ulatency prints the warning and forgets. New or changed processes (those which netlink event were lost) will be eventually handled in the next standard scheduler interval along with other changed processes (if still alive). This means less work for ulatency.

The high ulatency CPU usage comes probably from its reactions to netlink events that were NOT lost. But this means that some processes spawn/die or fork really quickly and your system is already under load.

Most expensive is handling of new processes and forks. New process comes to the new process queue and if alive for certain time it is passed to the scheduler. This delay is defined in ulatency configuration file (default /etc/ulatencyd/ulatencyd.conf) via delay_new_pid directive inside [core] group. This should be set to something like 500 or 1000 (ms). This should not be problem but you may want to try to increase that delay.

Worse is that before a new process landed inside the new queue, ulatency needs to get some basic information about it. This may cause overhead which should be reduced in the newer master code. Moreover ulatencyd 0.5 reacts to events caused by tasks (threads); this is avoided in newer code because the scheduler runs on processes not tasks. And also there were some bugs that caused already dead process went to scheduler etc.

Please, let me know if the ulatencyd CPU usage is lower with ulatency compiled from master branch.

Basically in the current code there are serveral issues related to you problem: 1) optimize handling of netlink process events 2) run netlink module in separate thread (so more CPU cores could be utilized :) 3) maybe temporary stop listening to netlink events if events are too frequent 4) avoid printing of that warning

V10lator, if you want those warnings to stop polluting your system logs, make ulatencyd use its own log file, e.g. start ulatencyd with "-f /var/log/ulatencyd.log" (btw, this should be default) otherwise logs are handled by glib and probably sent to stderr where are collected by journald (just guessing, i am not familiar with systemd).

Alternatively you may want to prevent ulatency to listen netlink events by setting netlink=false in the config. Then the scheduler will be run only every inerval seconds, e.g. every 10 seconds, on all processes which have been changed in this interval. You will loose fast reactions of ulatencyd of course.

Petr

V10lator commented 10 years ago

Thanks for that detailed answer. I won't install anything outside of my package manager (with that attitude I have a clean system since 10+ years - no reinstall of the system, when the hardware was about to die I just copied all over) and am not experienced with writing ebuild files but will try to write one when I find the time.

It's a bit weird hearing "the system is still under heavy load" cause that's exactly where I want ulatencyd to help, not to eat more CPU so delay the heavy load processes + reduce latency. ;) Also I saw it with low load, too (except you call around 30% CPU load from steam/compositor/X as heavy load). Anyway I'll see if it's better when I find the time for it (see above).

theradioboy commented 10 years ago

gajdusek, thanks a lot for your answer. It's nice to know more about the inner workings of ulatencyd. I cloned and tried to build, but it requires libprocps-dev. It's not available as a distro package nor at a ppa repo for ubuntu flavours at 12.04.* (Precise Pangolin). I figured out that it seems to be the equivalent of libproc. I symlinked /usr/lib/libprocps.a -> /usr/lib/libproc.a and /usr/lib/libprocps.so -> /usr/lib/libproc.so. I also used export LIBPROC_LIBRARIES=-L/usr/lib and export LIBPROC_LIBDIR=-L/usr/lib. Since CMakeLists.txt uses pkg_check_modules, I had to try disabling that line to build. Still, I am unable to build:

Static procps library not found: /libprocps.a

Am I doing this right? Or should I do something else to successfully cmake it?

gajdusek commented 10 years ago

Hi,

In the CMakeLists.txt you used was results of pkg_check_modules and therefore pkg-config forced. I fixed it in the master branch and improved libprocps detection. It now checks if required symbols are present and prints useful help if some are missing.

Unfortunately the libproc version you tried is old legacy procps that has at least one symbol missing.

You can download newer version from http://packages.ubuntu.com/saucy/libprocps0-dev and override CMake variables PROCPS_STATIC_LIBRARY and PROCPS_STATIC_INCLUDE_DIR .

In summary for i386 architecture you can

wget http://mirrors.kernel.org/ubuntu/pool/main/p/procps/libprocps0-dev_3.3.3-2ubuntu7_i386.deb
dpkg -x libprocps0-dev_3.3.3-2ubuntu7_i386.deb /tmp/procps
# switch to ulatencyd source and
cmake -D PROCPS_STATIC:BOOL=ON -D PROCPS_STATIC_LIBRARY:FILEPATH=/tmp/procps/usr/lib/i386-linux-gnu/libprocps.a -D PROCPS_STATIC_INCLUDE_DIR:PATH=/tmp/procps/usr/include/ .
make DEBUG=1
sudo make install
sudo /usr/local/sbin/ulatencyd -v -f /var/log/ulatencyd