intel / dleyna-server

dleyna-server is a library for implementing services that allow clients to discover, browse and manipulate Digital Media Servers. An implementation of such a service for linux is also included.
https://01.org/dleyna/
GNU Lesser General Public License v2.1
28 stars 28 forks source link

dleyna-server-service spins mainloop while running #150

Open heftig opened 9 years ago

heftig commented 9 years ago

While idling, dleyna-server-service saturates a CPU. This has been seen by me on Arch Linux, but there also exist reports for Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1240411) and Debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779731).

The main GMainContext is constantly dispatching. Maybe there is a g_idle_add source that is never removed. From my cursory examination it seems prv_process_task might be involved.

glib2 2.44.1 dleyna-connector-dbus 0.2.0 dleyna-* 0.5.0

rsbells commented 9 years ago

Jan - Thanks for filing this issue and attaching the issues reported in Fedora and Debian. We will try to reproduce it.

heftig commented 8 years ago

The source that's constantly getting dispatched is an idle source with the callback emit_notifications in libgupnp. A bit of tracing revealed that for the broken GUPnPServiceProxy, subscribe and server_handler were being called, but not subscribe_got_response. Yet, Wireshark shows that the SUBSCRIBE request was quickly answered with a 200.

heftig commented 8 years ago

Anyway, I shut down the media server that triggered the bug (a FRITZ!Box 7272, OS 06.30).

wolfv commented 8 years ago

I have this problem as well, and frankly it's been there since quite a while (Fedora 23 currently). It's also not very obvious what dleyna does, and how to stop it. Having the laptop fan going like crazy when I am only listening to music on rhythmbox or gnome music is a bit stressing.

wolfv commented 8 years ago

Unfortunately it's also not possible to delete dleyna server from fedora without also removing rhythmbox and gnome-music as they are dependend on dleyna for whatever reason.

phako commented 8 years ago

That could possibly also be an GUPnP bug - I would also be interested in reproduction

conrad-heimbold commented 7 years ago

I am currently facing this issue - dleyna-server-service uses 100% of my CPU. The stacktrace seems to be the same as in https://bugzilla.redhat.com/show_bug.cgi?id=1240411 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779731 . Output from gstack $(pidof dleyna-server-service):

Thread 4 (Thread 0x7fe374c3c700 (LWP 26929)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b27c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3  0x00007fe374c43fad in dconf_gdbus_worker_thread () at /usr/lib64/gio/modules/libdconfsettings.so
#4  0x00007fe37b942b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fe37b3526ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007fe37b08cf7f in clone () at /lib64/libc.so.6
Thread 3 (Thread 0x7fe37761f700 (LWP 26926)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b4f2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fe37beffa76 in gdbus_shared_thread_func () at /lib64/libgio-2.0.so.0
#4  0x00007fe37b942b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fe37b3526ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007fe37b08cf7f in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7fe377e20700 (LWP 26925)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b27c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3  0x00007fe37b91b2c1 in glib_worker_main () at /lib64/libglib-2.0.so.0
#4  0x00007fe37b942b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fe37b3526ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007fe37b08cf7f in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7fe37d572240 (LWP 26923)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b4f2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fe37cf5c8fc in dleyna_main_loop_start () at /lib64/libdleyna-core-1.0.so.4
#4  0x0000560572bd6e14 in main () 
phako commented 7 years ago

That stack trace shows everything idling... what UPnP devices do you have in your network?

conrad-heimbold commented 7 years ago

Interestingly, the issue seems to be gone now. I am/was directly attached by cable to the university network of the Leibniz-University Hannover (130.75.0.0/16). I can't tell if and how many UPnP devices there are in the university network; but I guess there are some. It might be gone because the university kicks out devices that spam others with UPnP. I have heard such stories from other universities.

phako commented 7 years ago

Very odd.

rtiemann commented 5 years ago

I ran into this issue as well. It happens when there is a "FRITZ!Box Fon WLAN 7240" on the network. After detaching the FRITZ!Box from the network (or simply blocking incoming traffic from that device via iptables) and restarting dleyna-server-service everything starts working again as expected.

A quick look with sysprof quite clearly showed that the main loop must be pretty busy, and a closer look with gdb confirmed this. I cannot, however, find out the exact root cause for this. Any suggestions what I could try to track this down?

Reproducible with dLeyna v0.5 and v0.6 (didn't test other versions) and libgupnp-1.0.

phako commented 5 years ago

I suspect the FritzBox has its media server enabled? Which FritzOS is it running?

rtiemann commented 5 years ago

Yes, media server was enabled. Switching it off, however, didn't change anything. FritzOS is 06.05, 1&1-branded (latest version downloadable from AVM seems to be 06.06).

I just saw that gupnp-universal-cp on Ubuntu 18.04.2 also hogs the CPU while the box is on the network. Among others, gupnp-universal-cp links against libgupnp, libgssdp, libsoup, and a whole bunch of GNOME libraries, but it doesn't use dLeyna. In addition, this program also uses 100% CPU time if a D-Link DIR-635 router is present. Blocking the FRITZ!Box and the router made gupnp-universal-cp behave well again.

So, this bug is probably not dLeyna's.

I'd still like to see this bug fixed and I have the time to help out, but not knowing any of the libraries involved, I don't really know where to start. Any hints?

phako commented 5 years ago

Yeah, that's what I suspected in https://github.com/intel/dleyna-server/issues/150#issuecomment-195954599 - that it's actually a GUPnP bug, presumably in the service introspection.

Would you mind trying to get a backtrace while it is hogging CPU? I have filed https://gitlab.gnome.org/GNOME/gupnp/issues/8 for GUPnP

rtiemann commented 5 years ago

Here is one:

Thread 5 (Thread 0x7ff6c33ec700 (LWP 6070)):
#0  0x00007ff6d02cdbf9 in __GI___poll (fds=0x5654882536c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ff6d0a375c9 in g_main_context_poll (priority=, n_fds=1, fds=0x5654882536c0, timeout=, context=0x5654882537d0) at ../../../../glib/gmain.c:4203
#2  0x00007ff6d0a375c9 in g_main_context_iterate (context=context@entry=0x5654882537d0, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3897
#3  0x00007ff6d0a376dc in g_main_context_iteration (context=0x5654882537d0, may_block=1) at ../../../../glib/gmain.c:3963
#4  0x00007ff6c33f436d in  () at /usr/lib/x86_64-linux-gnu/gio/modules/libdconfsettings.so
#5  0x00007ff6d0a5f195 in g_thread_proxy (data=0x565488247320) at ../../../../glib/gthread.c:784
#6  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c33ec700) at pthread_create.c:463
#7  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7ff6c3fff700 (LWP 6069)):
#0  0x00007ff6d02d4839 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007ff6d0a7d89a in g_cond_wait_until (cond=cond@entry=0x5654881f1208, mutex=mutex@entry=0x5654881f1200, end_time=end_time@entry=1666124582434) at ../../../../glib/gthread-posix.c:1449
#2  0x00007ff6d0a0a571 in g_async_queue_pop_intern_unlocked (queue=queue@entry=0x5654881f1200, wait=wait@entry=1, end_time=end_time@entry=1666124582434) at ../../../../glib/gasyncqueue.c:422
#3  0x00007ff6d0a0ab2c in g_async_queue_timeout_pop (queue=0x5654881f1200, timeout=timeout@entry=15000000) at ../../../../glib/gasyncqueue.c:543
#4  0x00007ff6d0a5fc3e in g_thread_pool_wait_for_new_pool () at ../../../../glib/gthreadpool.c:167
#5  0x00007ff6d0a5fc3e in g_thread_pool_thread_proxy (data=) at ../../../../glib/gthreadpool.c:364
#6  0x00007ff6d0a5f195 in g_thread_proxy (data=0x5654881f5720) at ../../../../glib/gthread.c:784
#7  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c3fff700) at pthread_create.c:463
#8  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7ff6c8b24700 (LWP 6068)):
#0  0x00007ff6d02cdbf9 in __GI___poll (fds=0x565488209770, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ff6d0a375c9 in g_main_context_poll (priority=, n_fds=2, fds=0x565488209770, timeout=, context=0x5654882076a0) at ../../../../glib/gmain.c:4203
#2  0x00007ff6d0a375c9 in g_main_context_iterate (context=0x5654882076a0, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3897
#3  0x00007ff6d0a37962 in g_main_loop_run (loop=0x565488208010) at ../../../../glib/gmain.c:4098
#4  0x00007ff6cfa580e6 in gdbus_shared_thread_func (user_data=0x565488207670) at ../../../../gio/gdbusprivate.c:275
#5  0x00007ff6d0a5f195 in g_thread_proxy (data=0x5654881f54a0) at ../../../../glib/gthread.c:784
#6  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c8b24700) at pthread_create.c:463
#7  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7ff6c9325700 (LWP 6067)):
#0  0x00007ff6d02cdbf9 in __GI___poll (fds=0x5654881f64b0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ff6d0a375c9 in g_main_context_poll (priority=, n_fds=1, fds=0x5654881f64b0, timeout=, context=0x5654881f5190) at ../../../../glib/gmain.c:4203
#2  0x00007ff6d0a375c9 in g_main_context_iterate (context=context@entry=0x5654881f5190, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3897
#3  0x00007ff6d0a376dc in g_main_context_iteration (context=0x5654881f5190, may_block=may_block@entry=1) at ../../../../glib/gmain.c:3963
#4  0x00007ff6d0a37721 in glib_worker_main (data=) at ../../../../glib/gmain.c:5772
#5  0x00007ff6d0a5f195 in g_thread_proxy (data=0x5654881f5400) at ../../../../glib/gthread.c:784
#6  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c9325700) at pthread_create.c:463
#7  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ff6d0ed8540 (LWP 6066)):
#0  0x00007ff6d0a344a0 in g_source_iter_next (iter=iter@entry=0x7fff9ff6d2a0, source=source@entry=0x7fff9ff6d298) at ../../../../glib/gmain.c:950
#1  0x00007ff6d0a36a63 in g_main_context_prepare (context=context@entry=0x5654881e5c20, priority=priority@entry=0x7fff9ff6d320) at ../../../../glib/gmain.c:3452
#2  0x00007ff6d0a374fb in g_main_context_iterate (context=0x5654881e5c20, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3882
#3  0x00007ff6d0a37962 in g_main_loop_run (loop=0x56548822bf60) at ../../../../glib/gmain.c:4098
#4  0x00007ff6d07e523e in dleyna_main_loop_start (server=0x565486a5df54 "dleyna-server-service", control_point=0x7ff6d07dfae0 , user_data=0x0) at libdleyna/core/main-loop.c:154
#5  0x0000565486a5de83 in main (argc=1, argv=0x7fff9ff6d598) at daemon.c:93
rtiemann commented 5 years ago

Oh, frame 4 in thread 5 is dconf_gdbus_worker_thread().

phako commented 5 years ago

hm. Possibly a rouge glib idle function... I tried to reproduce it with my FritzBox, unfortunately I could not.

rtiemann commented 5 years ago

I have written a minimalistic UPnP client based on libgupnp to reproduce the problem. Indeed, I was able to reproduce it, so from my perspective, dLeyna is out.

My test program simply searches for "upnp:rootdevice" targets, much like dLeyna does. My callback for GUPnPControlPoint's "device-proxy-available" signal does nothing unless the reported device is the problematic FRITZ!Box 7240. For that device, I get a GUPnPServiceProxy for service "urn:schemas-upnp-org:service:ContentDirectory" by calling gupnp_device_info_get_service(). If this function returns a non-NULL pointer, I subscribe to the service via gupnp_service_proxy_set_subscribed(), because this is what dLeyna does. And this is when the CPU fan starts spinning. :)

Inverting the filter so that all devices except the FRITZ!Box are subscribed to (7 devices) works as expected. That's all I know so far, I'll dig a bit deeper tomorrow.

phako commented 5 years ago

Thanks :) Can you please add further investigations to the GUPnP gitlab bug? https://gitlab.gnome.org/GNOME/gupnp/issues/8

rtiemann commented 5 years ago

Yes, as it stands, this seems to be the right place.

rtiemann commented 5 years ago

Seems like dLeyna isn't off the hook yet... It is definitely GUPnP hogging the CPU, but it stops doing it after a while (see discussion on GitLab).

My test program showed that there is some kind of timeout after which the CPU load goes down after a minute---so back to dLeyna. The timeout also applies here, but gupnp_service_proxy_set_subscribed() gets called again when a previous call of gupnp_service_proxy_set_subscribed() fails.

Trace to first subscription: prv_process_task()dleyna_service_task_process_cb()prv_subscribe()dls_device_subscribe_to_service_changes()gupnp_service_proxy_set_subscribed().

Successive subscriptions are attempted in prv_cd_subscription_lost_cb(), which for the FRITZ!Box is called after one minute. Since the subscription to the FRITZ!Box service never seems to succeed (where "never" means that I have waited for ~30 minutes), we have an infinite loop. (The FRITZ!Box ContentDirectory can be browsed, by the way.)

I don't know how this should be fixed. Counting retries? Maybe by taking the GError passed to prv_cd_subscription_lost_cb() into account?

>>> print *reason
$1 = {
  domain = 560,
  code = 0,
  message = 0x555555b8eb00 "Connection terminated unexpectedly"
}

Please let me know if you need more information or if you want me to try something.

phako commented 5 years ago

The fix for GUPnP is released as 1.0.4 for the oldstable and will be released as 1.2.2 for the current stable.

rtiemann commented 5 years ago

Nice, thank you for fixing this issue. :)

I have just checked dleyna-server-service against GUPnP 1.0.4, and sure enough it doesn't use much CPU anymore. The "Connection terminated unexpectedly" error, however, still occurs every minute as described in my previous comment: function gupnp_service_proxy_set_subscribed() gets called from prv_cd_subscription_lost_cb().

I can live with that very slow infinite loop now, but it doesn't look right.

phako commented 3 years ago

It does not handle the GError properly, as in: It's ignoring it completely.