gorakhargosh / watchdog

Python library and shell utilities to monitor filesystem events.
http://packages.python.org/watchdog/
Apache License 2.0
6.56k stars 694 forks source link

watchmedo does not support ignoring monitoring specific folder and its sub-folders #798

Open chuanran opened 3 years ago

chuanran commented 3 years ago

From the usage of watchmedo auto-restart, we can see the only possible way we can use to exclude a specific folder is using --ignore-pattern parameter, as following shows.

sh-4.4$ watchmedo auto-restart --help
usage: watchmedo auto-restart [-h] [-d directory] [-p PATTERNS] [-i IGNORE_PATTERNS] [-D] [-R] [--interval TIMEOUT] [--signal SIGNAL]
                              [--debug-force-polling] [--kill-after KILL_AFTER]
                              command [arg [arg ...]]

    Subcommand to start a long-running subprocess and restart it
    on matched events.

    :param args:
        Command line argument options.

positional arguments:
  command               Long-running command to run in a subprocess.
  arg                   Command arguments. Note: Use -- before the command arguments, otherwise watchmedo will try to interpret them. (default:
                        -)

optional arguments:
  -h, --help            show this help message and exit
  -d directory, --directory directory
                        Directory to watch. Use another -d or --directory option for each directory. (default: -)
  -p PATTERNS, --pattern PATTERNS, --patterns PATTERNS
                        matches event paths with these patterns (separated by ;). (default: '*')
  -i IGNORE_PATTERNS, --ignore-pattern IGNORE_PATTERNS, --ignore-patterns IGNORE_PATTERNS
                        ignores event paths with these patterns (separated by ;). (default: '')
  -D, --ignore-directories
                        ignores events for directories (default: False)
  -R, --recursive       monitors the directories recursively (default: False)
  --interval TIMEOUT, --timeout TIMEOUT
                        use this as the polling interval/blocking timeout (default: 1.0)
  --signal SIGNAL       stop the subprocess with this signal (default SIGINT) (default: 'SIGINT')
  --debug-force-polling
                        [debug] forces polling (default: False)
  --kill-after KILL_AFTER
                        when stopping, kill the subprocess after the specified timeout (default 10) (default: 10.0)

Checking the source code of watchmedo, we can find that watchmedo uses PurePosixPath to match the path and patterns (including ignore-patterns), and it is NOT supporting regex at all, so we cannot use regex to match the pattern.

in the doc for match method of PurePosixPath, can find that it does NOT support recursively searching and matching files in a specific folder and its sub-folders.

Is it possible for watchmedo to support such feature that can ignore any changes for a specific folder and its subfolders recursively? Thanks

taleinat commented 1 year ago

watchmedo uses pathlib.PurePath.match() for matching paths with patterns. This means it is possible, for example, to ignore everything under a directory /path/to/src/tests/ as follows:

watchmedo auto-restart -d /path/to/src --ignore-patterns='/path/to/src/tests/*;/path/to/src/tests/**/*' -- ...

The need to use both path/**/* and path/* is somewhat annoying, and I couldn't find a way around it, but the important thing is that this is possible!

As a path to improving watchmedo, it seems that using functions from the fnmatch module instead of pathlib.Path.match() would make this simpler, as one could simply use path/*. That would be a backwards-incompatible change, though, and in my opinion it wouldn't be worth it just for this. (But using pathlib.Path.match() as currently done has some other disadvantages as well, such as the pattern "b/c.py" matching a file "a/b/c.py".)

BoboTiG commented 1 year ago

Given that the current behavior is not ideal, and not intuitive, we could think about moving to fnmatch. I would be OK with such enhancement.

taleinat commented 1 year ago

For future reference, if someone does implement something like this, it seems it would be better to use gitignore-style path specifications (e.g. via the pathspec library), since fnmatch is aimed at matching file names rather than paths, and has some drawbacks for this use case due to that.

Also, to allow providing multiple path specs via cmdline, it would probably be better to allow specifying --pattern(s) and/or --ignore-pattern(s) multiple times rather than using a separator character as is currently done.

guitarino commented 1 year ago

@taleinat

It seems that watchmedo auto-restart currently doesn't support recursive patterns. I believe that for pathlib.PurePath.match(), path/**/* is equivalent to path/*/*, which only takes into account 2 levels of nesting. This is also what happens in my testing: changes deeper than 2 layers are not taken into account by either --patterns or --ignore-patterns. You can confirm this behavior in the python shell:

$ python3
>>> from pathlib import PurePath
>>> PurePath("a/b/c/d.txt").match("b/**/*")
True
>>> PurePath("a/b/c/d/e.txt").match("b/**/*")
False

This means that there is no reliable way to, for example, watch for recursive changes in directory my_project while recursively ignoring anything that happens in my_project/build. My current workaround is to construct ignore patterns programmatically with many layers of nesting i.e. build;build/*;build/*/*;build/*/*/*;[...], which, as you can imagine, not ideal.

I believe this is a real functionality limitation and I'm wondering if it's worth introducing additional parameters, such as --glob-pattern(s) and --ignore-glob-pattern(s), that would use more of a glob-style pattern instead?

Appreciate your help on this!

taleinat commented 1 year ago

You appear to be right, @guitarino! It seems that pathlib.PurePath.match() doesn't handle ** the way I thought it did.

taleinat commented 1 year ago

@BoboTiG, how would you suggest approaching this in terms of maintaining backwards-compatibility? Add a new CLI flag for this?

BoboTiG commented 1 year ago

Am I right if I say that pathspec is the way to go, and will handle all cases out of the box?

BoboTiG commented 1 year ago

About backward compatibility, I would say there is no really issue: we are trying to fix the initial behaviour. I would vote for fixing the current behaviour without introducing new arguments.

taleinat commented 1 year ago

Am I right if I say that pathspec is the way to go, and will handle all cases out of the box?

I don't know, but without testing various cases, I wouldn't assume that.

About backward compatibility, I would say there is no really issue: we are trying to fix the initial behaviour. I would vote for fixing the current behaviour without introducing new arguments.

I do think that if any currently valid use of --patterns or --ignore-patterns, changes in behaviour due to this change, that would be a backwards-compatibility break. This includes edge cases!

I would exclude uses of .../**/... from the above statement, since those quite obviously don't behave as anyone would expect, but I'd be careful before changing pretty much anything else. And there are so many edge cases with paths...

Unless we can retain full backwards-compat, it might be better to cut a new major version for this change, and clearly document it in the major changes and an upgrade guide.

BoboTiG commented 1 year ago

Unless we can retain full backwards-compat, it might be better to cut a new major version for this change, and clearly document it in the major changes and an upgrade guide.

Yes, indeed. It would be part of a major release. What I would like to prevent is adding new CLI arguments.

rileymcdowell commented 9 months ago

Looks like it's been about 9 months since this issue has gotten a comment. Obviously, this would be a breaking change and require a more significant release. Any thoughts to share about a release that would include the arbitrary-depth-globbing feature?

BoboTiG commented 9 months ago

The release is not an issue at all. We need someone to steps in, and provides a pull-request first.