gorakhargosh / watchdog

Python library and shell utilities to monitor filesystem events.
http://packages.python.org/watchdog/
Apache License 2.0
6.59k stars 696 forks source link

Calling inotify_init() on each watch #275

Open Naatan opened 10 years ago

Naatan commented 10 years ago

I was getting the error OSError: inotify instance limit reached with only about 50 watchers open, which seemed weird. So I started digging and found that inotify_init() is called for pretty much each watcher: https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/inotify_c.py#L163

I verified this by adding a print statement and seeing it print out my message for each watch that was being added. I then tried the following workaround:

--- a/watchdog/src/watchdog/observers/inotify_c.py
+++ b/watchdog/src/watchdog/observers/inotify_c.py
@@ -158,12 +158,19 @@ class Inotify(object):
         ``True`` if subdirectories should be monitored; ``False`` otherwise.
     """

+    _inotify_fd = None
+
     def __init__(self, path, recursive=False, event_mask=WATCHDOG_ALL_EVENTS):
         # The file descriptor associated with the inotify instance.
-        inotify_fd = inotify_init()
-        if inotify_fd == -1:
-            Inotify._raise_error()
-        self._inotify_fd = inotify_fd
+        if not Inotify._inotify_fd:
+            inotify_fd = inotify_init()
+
+            if inotify_fd == -1:
+                Inotify._raise_error()
+
+            Inotify._inotify_fd = inotify_fd
+
+        self._inotify_fd = Inotify._inotify_fd
         self._lock = threading.Lock()

         # Stores the watch descriptor for a given path.

This seemed to work, in that watchers are added without also increasing the number of inotify instances beyond the one that is required. However this breaks the actual events as well as removing the watchers, giving errors such as

Exception in thread Thread-21:
Traceback (most recent call last):
  File "**/dist/python/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "**/dist/bin/python/komodo/watchdog/observers/inotify_buffer.py", line 57, in run
    inotify_events = self._inotify.read_events()
  File "**/dist/bin/python/komodo/watchdog/observers/inotify_c.py", line 302, in read_events
    wd_path = self._path_for_wd[wd]
KeyError: 4

I'm hesitant to dig much further at this time because being as I am not familiar with this codebase I do not fully understand the implications. It seems obvious to me though that adding an instance for each watch is not what you should be doing though, hopefully this can get fixed relatively easily but I'll have to defer to people who know this codebase thoroughly.

tamland commented 10 years ago

I'm not the original author of that, but I believe the Inotify class was done like this in an attempt to create an object oriented wrapper around the inotify api. It encapsulate the whole api, that's why an inotify file descriptor is created for every instance. Apparently not good idea if you plan to create many of them. However, for me the inotify instance limit is set to 128. I don't know why you only got 50. Maybe you have other applications creating inotify instances..

The error you see in read_events is because if you simply share the file descriptor, every instance of Inotify will read from the same file descriptor (see inotify_c.py#L284) and receive events that does not belong to it.

To use only one file descriptor you will need to make the event reading static too, then distribute the events to the watch where it belongs somehow.

Naatan commented 10 years ago

Note the "whole Inotify api" consists of about 3 functions, so not sure how valid this object oriented approach is. Regardless this doesn't address the problem; watchdog is adding instances for each watcher; this is a bug no matter what design principle you apply to it.

If there is currently no active developer thoroughly familiar with this code I'd be happy to play further with it to try and come up with a proper fix, but I am mainly worried that this would require a ton of rewriting as it was obviously never designed to work with a single inotify instance (even though it should). Seems to me that this was implemented without a thorough understanding of how inotify works. Now I don't by any stretch claim to have a thorough understanding of inotify (on the contrary), but it seems obvious to me at this point that you should not be adding an inotify instance for each watcher.

tamland commented 10 years ago

Besides the unfortunate low default limit I don't think there is anything that says you must only create one inotify instance. There's mismatch of APIs. watchdog is designed around watching separate trees, while inotify aground single directories or files. Feel free to try to suggest a solution. Alternatively if you don't mind managing inotify watches yourself, implement an EventEmitter that don't create multiple inotify instances, but you'll lose the ability to have different event handles for different trees.

tovrstra commented 4 months ago

Despite the age of this issue, it is still alive and well. I was bitten by it today. :)

In my use case, another part of the code decides which directories are worth watching. I just need a way to regularly add or remove directories (possibly hundreds) from the ones being watched, and the recursive option is not even desirable. (This is different from #212.)

Looking at the inotify code in watchdog, it does indeed need a good refactoring to fix this issue. I know it's a long shot, but I'll try to suggest a very tentative plan to solve the issue:

  1. 1033 makes lots of changes, practically in all files, so it needs to be merged before a pull request can be made.

  2. A class INotifyFD can be added to hold the result of the inotify_init(). An instance of INotifyFD can be created in the InotifyObserver class, to make sure there are few of them by default. The INotifyFD also reads events from the low-level buffer and has basic support for adding and removing watched paths. (It adds almost no functionality above the low-level code.)
  3. Because the IN_MOVED_FROM and IN_MOVED_TO events need to be paired, as currently done by the InotifyBuffer. It seems logical to implement the pairing in the new INotifyFD class, so that it happens before the events get distributed to the different emitters.
  4. Each new InotifyEmitter instance created by InotifyObserver receives the same INotifyFD instance and registers a callback to receive records from the low-level event buffer. (This can be done by passing a factory function in the emitter_class argument of the BaseObserver constructor.)
  5. To support recursive watching, there should be an additional layer between INotifyFD and InotifyEmitter. For example, each InotifyEmitter can create an InotifyFilter helper to take care of filtering out the relevant paths from the INotifyFD events, and it can optionally transparently add/remove watches for subdirectories. (InotifyFilter can replace most of the logic that is currently implemented in the INotify class.)

@gorakhargosh @BoboTiG @tamland Would this plan make sense? There are obviously still a lot of details to be filled in.