Open snobear opened 10 years ago
I'm able to ignore by adding this piece to inotify_c.py
if dirname == ".snapshot":
continue
in the _add_dir_watch
method:
def _add_dir_watch(self, path, recursive, mask):
....snip....
if recursive:
for root, dirnames, _ in os.walk(path):
for dirname in dirnames:
full_path = absolute_path(os.path.join(root, dirname))
logger.info(dirname)
if os.path.islink(full_path):
continue
if dirname == ".snapshot":
continue
self._add_watch(full_path, mask)
Very hacky. Is this something you would be interested in implementing or accepting a pull request for? I'm thinking like an ignore_paths
setting like the event handler has but for the Observer class. It'd be a list, so would look more like:
if dirname in ignore_paths:
continue
Actually, this is the recommended way to exclude something from os.walk. In _add_dir_watch
in observers/inotify_c.py:
if recursive:
for root, dirnames, _ in os.walk(path):
try:
# directory exclusions would go here
dirnames.remove('.snapshot')
except ValueError:
pass
for dirname in dirnames:
Any updates on this? This seems like a big missing feature -- without it you can't prevent watchdog from descending into subfolders like .git
and .tox
(at the root level or at an arbitrary level) and getting swamped by the number of files there.
I'd also like this. It would be nice to be able to exclude a list of paths. It would also be nice to only watch a white-list of paths in a root directory, too. For example, it may sometimes be easier to create a white-list then specify .git
, .hg
, .snapshot
, .tox
, etc, especially when the same watching code is desired on multiple projects that have different dirs that shouldn't be watched.
Searching ways to exclude subdirectory from inotify observer. Just ignoring events - bad idea - I have folders with lots of files.
If somebody knows such - tell me please.
Hello. Any updates on this ?
While fsnotify and inotify backends use os.walk, the Windows backend does not. How could this be added to the windows backend?
I'd like to resurrect this issue. I patched watchdog for an internal use case like this to ignore a set of directories. In our situation these are monorepos with many hundreds of thousands of subdirs, so ignoring the events is not enough. You have to avoid watching them in the first place.
diff --git a/src/watchdog/observers/inotify_c.py b/src/watchdog/observers/inotify_c.py
index c297c67..cf629df 100644
--- a/src/watchdog/observers/inotify_c.py
+++ b/src/watchdog/observers/inotify_c.py
@@ -163,6 +163,7 @@ class Inotify:
self._path = path
self._event_mask = event_mask
self._is_recursive = recursive
+ self._exclude_dirs = {b"ignore_dir_1", b"ignore_dir_2"}
if os.path.isdir(path):
self._add_dir_watch(path, recursive, event_mask)
else:
@@ -261,7 +262,8 @@ class Inotify:
def _recursive_simulate(src_path):
events = []
- for root, dirnames, filenames in os.walk(src_path):
+ for root, dirnames, filenames in os.walk(src_path, topdown=True):
+ dirnames[:] = [d for d in dirnames if d not in self._exclude_dirs]
for dirname in dirnames:
try:
full_path = os.path.join(root, dirname)
@@ -363,7 +365,8 @@ class Inotify:
raise OSError(errno.ENOTDIR, os.strerror(errno.ENOTDIR), path)
self._add_watch(path, mask)
if recursive:
- for root, dirnames, _ in os.walk(path):
+ for root, dirnames, _ in os.walk(path, topdown=True):
+ dirnames[:] = [d for d in dirnames if d not in self._exclude_dirs]
for dirname in dirnames:
full_path = os.path.join(root, dirname)
if os.path.islink(full_path):
@@ -380,6 +383,8 @@ class Inotify:
:param mask:
Event bit mask.
"""
+ if any(path.startswith(d) for d in self._exclude_dirs) and os.path.isdir(path):
+ return
wd = inotify_add_watch(self._inotify_fd, path, mask)
if wd == -1:
Inotify._raise_error()
The change in _add_watch
ensures inotify watches are not set on the directories in self._exclude_dirs
, and the other changes ensure we don't even descend into those directories (i.e. a strictly optional performance optimization).
Obviously the fact that this patch hard-codes the ignored directories, and this change has no effect on macOS or Windows (or for users of the polling observer), mean it's not suitable for a PR.
But if I tidied this up, added tests, and extended coverage to the other observers, is this a PR that would be of interest? This would give watchdog parity with the ignore_dirs
feature of watchman.
But if I tidied this up, added tests, and extended coverage to the other observers, is this a PR that would be of interest?
Yes, absolutely! :)
Awesome. Quick question about user-facing API before I start. Ideally (i.e. assuming the performance difference is negligible), should the option to ignore dirs allow the user to:
bar
when watching /root would ignore exactly the directory /root/bar
, and foo/bar
would ignore exactly the directory /root/foo/bar
bar
would exclude any directory called bar
, including /root/bar
and /root/a/b/c/bar
foo/bar
ignore any directory bar
inside any directory foo
, or should nested paths be disallowed?FWIW, watchman goes with option 1, and that is my (weak) preference.
How about the pattern used by .gitignore files, including globbing? A bit more work probably, but it would be really nice.
.gitignore
uses both options, right? Plus patterns.
I am not sure what is the best approach. For the use case of ignoring all .git
folders, for instance, it could be more practical to use option 2.
Using a mix of both options could be interesting. The behaviour would be different when the ignored pattern starts with a slash to say it is relative to the root. Else it is a common name to ignore.
WDYT?
Lots of thoughts!
To cover 100% of users, the most general option is to allow the user to pass in an arbitrary Callable[[str], bool]
which, if False when passed the directory path, does not watch the directory and does not descend into the directory. This would allow users to self-serve solutions to use cases like "don't watch any git repositories under root" (which happens to be my use case, although I'm fortunate in that the repository directory is known in advance, so I don't personally need this level of control).
Adopting git-style ignore globs (aka pathspec/wildmatch) would be very flexible and powerful and probably cover ... less than 100% of users, but the vast majority of them? Does anyone know any python implementations other than pathspec (which looks fine). My one concern is that I'd be a little nervous about performance implications of adding complex tests to the middle of some pretty tight loops. Obviously these concerns exist for the idea of allowing the user to pass an arbitrary callable too, but I think it's a bit different when the test is implemented in watchdog. There's an expectation of performance that doesn't exist when a user is passing in their own code. Are there any existing watchdog performance tests?
OK, those are the more ambitious ideas. I think they are interesting and doable! Back to the basics.
I am not sure what is the best approach. For the use case of ignoring all .git folders, for instance, it could be more practical to use option 2.
FWIW, For the specific case of ignoring vcs object directories, watchman has ignore_vcs
, which is essentially option 2 with a hard-coded set of directory names (.git
, .svn
, etc.). (There's a little more to it than that, but I assume that's the rough idea from a user POV.)
Using a mix of both options could be interesting. The behaviour would be different when the ignored pattern starts with a slash to say it is relative to the root. Else it is a common name to ignore.
I like this! There could also be ignore_dir_by_name
and ignore_dir_by_relpath
to explicitly provide both options. I'd personally lean toward that over a leading /
on the basis that explicit is better than implict, but I don't feel very strongly.
To a great extent, this is a judgment call about your users which a maintainer is in a much better position to make than me! Any of these options (or more than one of them!) seem reasonable to me. Part of me thinks "just do what watchman does, since it obviously works at Facebook scale". But part of me thinks "watchdog's strength is that it's implemented in a dynamic language, which allows it to offer the user more control, so we should go nuts and do the callable thing, and offer a particularly powerful example of its use, i.e. git-style ignores.
It's probably worth thinking about how this all fits in with the existing regex-based event handler too. There's a potential for user confusion there.
I'm familiar with ignoring events, but how do prevent a subdirectory from actually being watched? e.g. I'm watching
/usr/jason
but want to exclude/usr/jason/.snapshot
or even/usr/jason/work/.snapshot
.Our NetApp file servers create snapshots of directories and this causes problems with inotify. There may be millions of files inside a .snapshot dir, and so I've noticed my watchdog script crashes on these. An
strace
shows inotify_add_watch being set on everything under .snapshot, and ends up crashing.I saw this in #175
I don't see anything in the docs that show how to exclude/ignore a subdirectory. Any tips? Thanks for the help.