hedger / minidlna-kqueue

Fork of ReadyMedia (formerly known as MiniDLNA) DLNA server with support for BSD's kqueue()-based file system monitoring.
Other
3 stars 0 forks source link

MiniDLNA refuses to quit #5

Open sorin-ionescu opened 10 years ago

sorin-ionescu commented 10 years ago

In inotify.c, the following code does not seem to work.

  while( scanning )
  {
    if( quitting )
      goto quitting;
    sleep(1);
  }

I get the following output after which it hangs and I have to terminate it manually with pkill -KILL minidlnad.

[2014/01/02 16:51:15] minidlna.c:153: warn: received signal 2, good-bye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[2014/01/02 16:51:15] minissdp.c:734: maxdebug: Sending ssdp:byebye
[1]    12265 killed     ./minidlnad
sorin-ionescu commented 10 years ago

It gets stuck after the following line in minidlnad.c.

pthread_join(inotify_thread, NULL);
sorin-ionescu commented 10 years ago

I figured out what the problem is. The following function call in inotify.c blocks.

kevent(global_kqueue_handle, NULL, 0, &ke, 1, NULL)

So, when I try to quit MiniDLNA, it fails to do so because it's still waiting for an event. The only way to quit it cleanly is to generate an event by moving a file into the watched directories.

In order to fix this, kevent will manually have to be triggered from the main thread, perhaps using EVFILT_USER.

stesser commented 10 years ago

Am 03.01.2014 04:20, schrieb Sorin Ionescu:

I figured out what the problem is. The following function call in inotify.c blocks.

kevent(global_kqueue_handle, NULL, 0, &ke, 1, NULL)

So, when I try to quit MiniDLNA, it fails to do so because it's still waiting for an event. The only way to quit it cleanly is to generate an event by moving a file into the watched directories.

In order to fix this, kevent will manually have to be triggered from the main thread, perhaps using |EVFILT_USER|.

The problem is, that the inotify thread checks "quitting" before the signal handler had a chance to set it to 1.

I have locally fixed this by adding "sleep(1)" in the following place:

--- inotify.c~ 2014-01-02 23:48:35.343595045 +0100 +++ inotify.c 2014-01-03 15:04:56.188043417 +0100 @@ -831,6 +831,7 @@ if ( global_kqueue_handle < 0 ) { DPRINTF(E_ERROR, L_INOTIFY, "kqueue() failed: %s\n", strerror(errno));

This 1 second sleep only occurs after a signal has been received, not during normal operation.

Regards, STefan

stesser commented 10 years ago

Am 03.01.2014 15:20, schrieb Stefan Esser:

Am 03.01.2014 04:20, schrieb Sorin Ionescu:

I figured out what the problem is. The following function call in inotify.c blocks.

kevent(global_kqueue_handle, NULL, 0, &ke, 1, NULL)

So, when I try to quit MiniDLNA, it fails to do so because it's still waiting for an event. The only way to quit it cleanly is to generate an event by moving a file into the watched directories.

In order to fix this, kevent will manually have to be triggered from the main thread, perhaps using |EVFILT_USER|.

The problem is, that the inotify thread checks "quitting" before the signal handler had a chance to set it to 1.

I have locally fixed this by adding "sleep(1)" in the following place:

--- inotify.c~ 2014-01-02 23:48:35.343595045 +0100 +++ inotify.c 2014-01-03 15:04:56.188043417 +0100 @@ -831,6 +831,7 @@ if ( global_kqueue_handle < 0 ) { DPRINTF(E_ERROR, L_INOTIFY, "kqueue() failed: %s\n", strerror(errno));

  • sleep(1); return 0; }

This 1 second sleep only occurs after a signal has been received, not during normal operation.

Sorry for the wrong patch ...

I created it for the purpose of this mail from a clean check-out, but put it into the wrong block.

The sleep(1) has to be added to the if block in the main loop:

    while( !quitting )
    {
            struct kevent ke;
            if ( kevent(global_kqueue_handle, NULL, 0, &ke, 1, NULL)

== -1 ) { DPRINTF(E_WARN, L_INOTIFY, "kevent polling failure: %s\n", strerror(errno)); sleep(1); // **** continue; }

This sleep(1) assures that quitting is set when the sleep ends, and the loop is terminated.

Regards, STefan

sorin-ionescu commented 10 years ago

I fear that sleep can create a race condition. I've been toying with something better, manually signalling using EVFILT_USER from minidlna.c. I'll open a pull request with my fixes.

sorin-ionescu commented 10 years ago

Your repository is different from the patch you submitted. Whitespace is all over the place.

sorin-ionescu commented 10 years ago

Since your repository is different from the patch you have submitted, I have posted the patch bellow. This also fixes #4.

This is a patch to your patch. So first, you have to apply that one.

diff --git 1/configure.ac 2/configure.ac
index f9fcfc0..73e8fb2 100644
--- 1/configure.ac
+++ 2/configure.ac
@@ -455,6 +455,8 @@ AC_CHECK_LIB(vorbisfile, vorbis_comment_query,

 AC_CHECK_HEADERS([arpa/inet.h asm/unistd.h endian.h machine/endian.h fcntl.h libintl.h locale.h netdb.h netinet/in.h stddef.h stdlib.h string.h sys/file.h sys/inotify.h sys/ioctl.h sys/param.h sys/socket.h sys/time.h unistd.h sys/event.h])

+AC_CHECK_FUNCS([getrlimit setrlimit])
+
 AC_CHECK_FUNCS(inotify_init, AC_DEFINE(HAVE_INOTIFY,1,[Whether kernel has inotify support]), [
     AC_MSG_CHECKING([for __NR_inotify_init syscall])
     AC_COMPILE_IFELSE(
diff --git 1/inotify.c 2/inotify.c
index 5e7d806..5e467e1 100644
--- 1/inotify.c
+++ 2/inotify.c
@@ -806,13 +806,9 @@ quitting:

 #else
 void *
-start_kqueue()
+start_kqueue(void* arg)
 {
-  int global_kqueue_handle = -1;
-
-  global_kqueue_handle = kqueue();
-  if ( global_kqueue_handle < 0 )
-    DPRINTF(E_ERROR, L_INOTIFY, "kqueue() failed: %s\n", strerror(errno));
+  int global_kqueue_handle = *((int *)arg);

   while( scanning )
   {
@@ -821,20 +817,48 @@ start_kqueue()
     sleep(1);
   }

+#if defined(HAVE_GETRLIMIT) && defined(HAVE_SETRLIMIT)
+  struct rlimit rl;
+
+  if (!getrlimit(RLIMIT_NOFILE, &rl))
+  {
+    rl.rlim_cur = rl.rlim_max;
+    if (setrlimit(RLIMIT_NOFILE, &rl) != 0)
+    {
+#if defined(__APPLE__) && defined(RLIMIT_NOFILE) && defined(OPEN_MAX)
+      // On Mac OS X, setrlimit(RLIMIT_NOFILE, &rl) fails to set
+      // rlim_cur above OPEN_MAX even if rlim_max > OPEN_MAX.
+      if (rl.rlim_cur > OPEN_MAX)
+      {
+        rl.rlim_cur = OPEN_MAX;
+        setrlimit(RLIMIT_NOFILE, &rl);
+      }
+#endif
+    }
+  }
+#endif
+
  inotify_create_watches(global_kqueue_handle);
  if (setpriority(PRIO_PROCESS, 0, 19) == -1)
    DPRINTF(E_WARN, L_INOTIFY,  "Failed to reduce kqueue thread priority\n");
  sqlite3_release_memory(1<<31);
  av_register_all();

+ struct kevent ke;
+ EV_SET(&ke, 0, EVFILT_USER, EV_ADD, 0, NULL, NULL);
+ if (kevent(global_kqueue_handle, &ke, 1, NULL, 0, NULL) == -1)
+   DPRINTF(E_ERROR, L_INOTIFY, "Failed to register kqueue quitting event\n");
+ EV_SET(&ke, 0, 0, 0, 0, 0, 0);
  while( !quitting )
  {
-   struct kevent ke;
    if ( kevent(global_kqueue_handle, NULL, 0, &ke, 1, NULL) == -1 )
    {
      DPRINTF(E_WARN, L_INOTIFY,  "kevent polling failure: %s\n", strerror(errno));
   }

+   if (ke.filter == EVFILT_USER && ke.ident == 0)
+     continue;
+
    /*DPRINTF(E_DEBUG, L_INOTIFY,  "GOT KEVENT:\n"
      "ident=0x%X, filter=0x%X, flags=0x%X, fflags=0x%X, data=0x%X, udata=0x%X\n", 
      ke.ident, ke.filter, ke.flags, ke.fflags, ke.data, ke.udata);*/
diff --git 1/minidlna.c 2/minidlna.c
index 154ddc1..3838c6b 100644
--- 1/minidlna.c
+++ 2/minidlna.c
@@ -75,6 +75,10 @@
 #include <libintl.h>
 #endif

+#ifdef HAVE_SYS_EVENT_H
+#include <sys/event.h>
+#endif
+
 #include "upnpglobalvars.h"
 #include "sql.h"
 #include "upnphttp.h"
@@ -958,6 +962,9 @@ main(int argc, char **argv)
    struct sockaddr_in tivo_bcast;
    struct timeval lastbeacontime = {0, 0};
 #endif
+#ifdef HAVE_SYS_EVENT_H
+  int global_kqueue_handle = -1;
+#endif

    for (i = 0; i < L_MAX; i++)
        log_level[i] = E_WARN;
@@ -1003,7 +1010,11 @@ main(int argc, char **argv)
     if (!sqlite3_threadsafe() || sqlite3_libversion_number() < 3005001)
       DPRINTF(E_ERROR, L_GENERAL, "SQLite library is not threadsafe!  "
       "Kqueue will be disabled.\n");
-    else if (pthread_create(&inotify_thread, NULL, start_kqueue, NULL) != 0)
+
+    global_kqueue_handle = kqueue();
+    if (global_kqueue_handle < 0)
+      DPRINTF(E_ERROR, L_INOTIFY, "kqueue() failed: %s\n", strerror(errno));
+    if (pthread_create(&inotify_thread, NULL, start_kqueue, (void *)&global_kqueue_handle) != 0)
       DPRINTF(E_FATAL, L_GENERAL, "ERROR: pthread_create() failed for start_kqueue. EXITING\n");
   }    

@@ -1283,8 +1294,15 @@ shutdown:
        close(lan_addr[i].snotify);
    }

-   if (inotify_thread)
+  if (inotify_thread) {
+#ifdef HAVE_SYS_EVENT_H
+    struct kevent quit_event;
+    EV_SET(&quit_event, 0, EVFILT_USER, 0, NOTE_TRIGGER, NULL, NULL);
+    if (kevent(global_kqueue_handle, &quit_event, 1, NULL, 0, NULL ) == -1)
+      DPRINTF(E_ERROR, L_GENERAL, "Failed to trigger quit event: %s\n", strerror(errno));
+#endif
        pthread_join(inotify_thread, NULL);
+  }

    sql_exec(db, "UPDATE SETTINGS set VALUE = '%u' where KEY = 'UPDATE_ID'", updateID);
    sqlite3_close(db);
sorin-ionescu commented 10 years ago

I have put both patches, yours and mine, in a gist.

stesser commented 10 years ago

Am 23.01.2014 02:43, schrieb Sorin Ionescu:

Since your repository is different from the patch you have submitted http://sourceforge.net/p/minidlna/patches/106/, I have posted the patch bellow. This also fixes #4

Thank you for sending the patch, I'll give it a try.

Sorry, I do not know how to work with GIT (just RCS/CVS/SVN) and have based my changes on the sources used by the FreeBSD port of minidlna.

Based on the latest FreeBSD port version, I have modified minidlna to use poll() instead of select(). The reason is the kqueue() code in the notify handler, which opens a file descriptor per watched directory (some 4000 in my case). Since that number exceeds the default limit of select(), I had to define FD_SETSIZE (I used 8192 to allow for some growth). But select() is inefficient with that number of file descriptors, poll() is a much better choice.

I attach my (working but not cleaned up) version of the poll patch to this mail. Let me know if you want me to explain the implementation. A few remarks:

I used a secondary array to hold information on the polled file descriptors. This array replaces all but one of the linked lists of the select() case. There are helper functions that add or remove file descriptors from the poll array, or change the events to watch for. These have been added to upnpevents.h, just because that header is included in the relevant C files, but the definitions should be moved to a better place (that's part of the missing cleanup).

And the large patch of upnphttp.c is mostly a white-space change (deeper indentation level of most of a large function).

The changes are quite localized, in fact. Instead of FD_SET the code maintains the poll array. Instead of testing FD_ISSET it checks the revents field of each file descriptor in the poll array. The secondary array (pollinfo[]) keeps information on the type of file descriptor and a pointer to the object that holds it's state (this pointer replaces the linked lists). Some further code simplification is possible, but I tried to keep as much of the previous code as possible, in order to simplify testing and verification nof the patch.

The patch also adds a timeout needed for LG TVs, which open new HTTP sockets (even when watching TV) and tend to leave them connected but idle until they timeout (after hours). This seems to overload the TV's processor, leading to teared display frames and tens of seconds of latency for remote control actions (sometimes more than a minute). I had put debugging code into minidlna and found that the TV opens hundreds of unused HTTP sockets. I added code to close any HTTP socket that has not received any data after 5 seconds (for production code, that might be a tuneable in the config file). This does not affect normal operation of any of my devices, since only the LG TV opens such connections without ever using them.

I have not seperated the connecting socket timeout from the poll patch, but plan to do so while performing a cleanup of the patch.

Please let me know, what you think about this patch and whether you think it might become part of the minidlna distribution.

Best regards, STefan --- /usr/ports/net/minidlna/work.nopoll/minidlna-1.1.1/minidlna.c 2014-01-03 23:42:34.987346000 +0100 +++ ./minidlna.c 2014-01-19 13:47:33.954480041 +0100 @@ -68,6 +68,7 @@

include

include

include

+#include

include "config.h"

@@ -100,6 +101,11 @@

define sqlite3_threadsafe() 0

endif

+static struct pollfd pollfds; +static struct pollinfo *pollinfo; +static int pollalloc; +static int pollused; + / OpenAndConfHTTPSocket() :

+#ifdef DEBUG +static char * +pollfdtypename(int pollfdtype) +{

- LIST_INIT(&upnphttphead);

ret = open_db(NULL);
if (ret == 0)
{

@@ -1048,6 +1179,13 @@

SendSSDPGoodbyes();

- FD_ZERO(&readset);

- }

- }

@@ -165,12 +167,16 @@ renewSubscription(const char * sid, int sidlen, int timeout) { struct subscriber * sub;

@@ -187,11 +193,13 @@ if(sub->notify) { sub->notify->sub = NULL; }

@@ -247,6 +255,8 @@ if(sub) sub->notify = obj; LIST_INSERT_HEAD(&notifylist, obj, entries);

+/* close the socket connected to the notify object / +static void +upnp_event_notify_close(struct upnp_event_notify \ obj) +{

+void +upnpevents_setupobj(void * pollobj) +{

-void upnpevents_selectfds(fd_set readset, fd_set writeset, int * max_fd) +void upnpevents_processpoll_one(void *pollobj) {

-void upnpevents_processfds(fd_set readset, fd_set writeset) +void upnpevents_process_cleanup_notify_one(void *pollobj) {

-void upnpevents_selectfds(fd_set readset, fd_set writeset, int * max_fd); -void upnpevents_processfds(fd_set readset, fd_set writeset); +#include +struct pollinfo {

+void +OpenSocket_upnphttp(int shttpl) +{

@@ -993,6 +1052,7 @@ { case 0: n = recv(h->socket, buf, 2048, 0); +// DPRINTF(E_DEBUG, L_HTTP, "recv (state0): fd=%d ret=%d\n", h->socket, n); if(n<0) { DPRINTF(E_ERROR, L_HTTP, "recv (state0): %s\n", strerror(errno)); @@ -1000,7 +1060,7 @@ } else if(n==0) {

+/* OpenSocketupnphttp() / +void +OpenSocketupnphttp(int shttpl); + / CloseSocket_upnphttp() / void CloseSocket_upnphttp(struct upnphttp );

+/* CheckIdleupnphttp() / +void +CheckIdle_upnphttp(struct upnphttp * h, timet timestamp); + / Delete_upnphttp() / void Delete_upnphttp(struct upnphttp ); --- /usr/ports/net/minidlna/work.nopoll/minidlna-1.1.1/upnpsoap.c 2013-11-02 02:06:41.000000000 +0100 +++ ./upnpsoap.c 2014-01-13 21:49:22.798865941 +0100 @@ -104,6 +104,7 @@ h->res_buflen += sizeof(afterbody) - 1;

SendResp_upnphttp(h);

+// DPRINTF(E_MAXDEBUG, L_HTTP, "Send reply (fd=%d): %.*s\n", h->socket, h->res_buflen, h->res_buf); CloseSocket_upnphttp(h); }

sorin-ionescu commented 10 years ago

I have solved the number of open file descriptors problem in my patch to your kqueue patch by requesting the maximum allowed using getrlimit and setrlimit. Unfortunately, kqueue will probably still fail for users with huge libraries. It's possible to increase the limit even further by tuning kernel limits, but that is an operation the user will have to do manually.

However, since I am on Mac OS X, I have access to an alternative API, FSEvents, which does not use file descriptors. It's easy to use, and works well. You can see my FSEvents patch.

stesser commented 10 years ago

Am 23.01.2014 18:17, schrieb Sorin Ionescu:

I have solved the number of open file descriptors problem in my patch to your kqueue patch https://gist.github.com/sorin-ionescu/8571344 by requesting the maximum allowed using |getrlimit| and |setrlimit|.

There are two problems with many watched directories:

1) You are increasing the number of file descriptors allowed, to avoid hitting the resource limit.

2) But with more than 1024 file descriptors used for kqueue(), select() fails for HTTP sockets due to the FD_SETSIZE limit.

These two limits are completely unrelated. The select() limit can be increased by defining FD_SETSIZE to cover sufficient range. But that makes select() very inefficient. The poll changes deal with this inefficiency that is inherent to select() if a process has so many file descriptors to cover (the HTTP sockets will typically have file descriptors above those used for kqueue() ...).

Unfortunately, |kqueue| will probably still fail for users with huge libraries. It's possible to increase the limit even further by tuning kernel limits http://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, but that is an operation the user will have to do manually.

This is a problem on systems with little RAM, due to auto-sizing of kernel tables. The un-tuned value of openfiles is 225000 on my system (which is unreasonably high as a default), kern.maxfiles is 250000, which is also much higher than I'll ever need.

However, since I am on Mac OS X, I have access to an alternative API, FSEvents https://developer.apple.com/library/Mac/documentation/Darwin/Reference/FSEvents_Ref/Reference/reference.html, which does not use file descriptors. It's easy to use, and works well. You can see my FSEvents patch https://gist.github.com/sorin-ionescu/8571424.

The FSEvents interface is much higher level than kqueue() and it does in fact avoid the high number of file descriptors problem. But since I do not have any MacOS systems, it does not solve my problem (which also affects other FreeBSD users with a high number of media directories - minidlna crashes for them if "inotify=yes").

I haven't had time to look at the patch mentioned in your other mail, yet - will do so as soon as possible.

Regards, STefan