jimporter / bfg9000

bfg9000 - build file generator
https://jimporter.github.io/bfg9000
BSD 3-Clause "New" or "Revised" License
76 stars 21 forks source link

How to filter entire subdirectory in `find_files()`? #131

Closed abingham closed 4 years ago

abingham commented 4 years ago

I'm trying to filter out all __pycache__ directories and their contents with the exclude argument to find_files(). I've tried things like exclude='**/__pycache__/**', exclude='*__pycache__*' and lots of other versions, but nothing seems to work. It looks like the matching ultimately devolves to regex matching, so maybe I'm not escaping things correctly or something.

How can I do this?

jimporter commented 4 years ago

This is more complicated than it really ought to be, since the exclude filter only looks at the basename of a path, and excluding a directory doesn't stop bfg from looking into the directory's contents. This is almost certainly the wrong behavior, but I initially added exclude just to ignore backup files from your editor (hence it excludes stuff like foobar~ by default).

The short term fix is to use the filter argument, which takes a function accepting a Path object and a string representing the type (f or d):

def no_pycache(path, type):
    # path.split() returns a list of all the directory components of a path,
    # relative to its root.
    return (FindResult.exclude if '__pycache__' in path.split() else
            FindResult.include)

find_files('mypath', filter=no_pycache)

I've been thinking about how to make find_files less irritating to use (see #129), so I'll try to roll a proper fix for this into that issue, or maybe just resolve this on its own. On the plus side, I'm 99% done with v0.6, so I should be able to add in a fix for this and publish that in a week or two.

abingham commented 4 years ago

Thanks! That worked well.

jimporter commented 4 years ago

With the landing of 52232eb5d39d820ea3cbac26be06f5d88a61cd01 and 5f56b15352a9c26b6ff78b3d838da49cf092cb23, there are now two easy ways to do this, and one not-so-easy way (but still easier than the above) to do it:

# 1) Use `exclude` (the trailing slash is needed to make it match a directory)
find_files('mypath/**', exclude='__pycache__/')

# 2) Use `project(find_exclude=...)`; this overwrites the default exclude list
project(find_exclude=['.*#', '*~', '#*#', '__pycache__/'])
find_files('mypath/**')

# 3) Use `FindResult.exclude_recursive`; this is overly-complex for this case,
#    but useful for others
def no_pycache(path, type):
    if path.basename() == '__pycache__':
        return FindResult.exclude_recursive
    return FindResult.include

find_files('mypath/**', filter=no_pycache)